REMARKS 

The claims have been amended to farther clarify the meaning of the term "associated 
values" so that the claim limitations relate more clearly to practical applications for analysis of 
data acquired from biological samples. As amended, the claims now clearly state that the 
associated values are acquired by a process where biological samples containing a plurality of 
genes are hybridized to one or more microarrays of probes, thus measuring the levels of mRNA 
or protein in the biological samples. This is how the associated values are acquired. 

35 U.S.C. 101 REJECTION 

Claims 1-22, 28-30, 33, 44, 46, 58 and 60 remain rejected under 35 U.S.C. 101. New 
Claims 65 and 66 are rejected as well as directed to non- statutory subject matter. The rejection is 
respectfully traversed. 

The Examiner is of the opinion that the claimed invention of the rejected claims does not 
produce a useful, concrete and tangible result, but rather merely encompass combinations of 
groups of data about statistical differences in mRNA or protein levels, with no specific output 
that meets the concrete, tangible and useful criteria, or merely describes "functional descriptive 
material." We disagree insofar as the rejection is applied to the claims as amended. We believe 
that one skilled in the art will think that the claimed invention of the Claims as amended 
produces a useful, concrete and tangible result. 

As clearly explained in the present Application, one of the major problems in analysis of 
gene related data is that the process normally used in collecting information to find statistically 
significant biological phenomena in biological samples is by its very nature noisy. Thus, it is 
difficult to distinguish between measured variations in mRNA or protein levels due to the noise 
inherent in the process from variations in mRNA or protein levels caused by statistically 
significant biological phenomena. This is overcome in the claims as amended by deriving or 
providing an expected value of the parameter, where the expected value is indicative of extent of 
variations in the parameter introduced by the data collection process itself. Only when the 
observed phenomenon results in values of the parameter that are significant compared to such 
expected value will the gene exhibiting such characteristics be identified to be associated with 
statistically significant biological phenomena. This results in a better success rate of detecting 
bona fide changes in gene expression and a reduced false discovery rate in the examples 
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described in the present application. This limitation is present in all of the rejected Independent 
Claims. An embodiment of this feature is described in more detail on pages 12 -14 of the 
specification, and the improved results over conventional techniques on pages 15-18. 

The above feature is taken one step further in claims 28, 46, 60 and 66. This is illustrated 
in the embodiment described on pages 12-14 of the present application. The sets of values of the 
relative difference d(i) in expression of the genes are permuted to arrive at sets of relative 
difference values different from the original sets. The values in the new sets are then ranked, and 
an expected value of such relative difference for each rank is provided. Thus, comparing the 
largest relative difference among all the genes to the largest relative differences from the 
permutations provides one possible test for identifying genes to be of statistical significance. 
Therefore, the average of the largest relative differences from the permutations is the expected 
relative difference for such gene. A comparison of the relative difference of such gene with its 
expected value can be used as control as to whether statistical significance should be assigned to 
such gene. The same reasoning applies to the gene of the second highest relative difference and 
comparison to the second largest relative differences from the permutations, and so on for all the 
genes involved in the calculation. This process involving ranking the relative difference values 
further enhances the ability to identify biological phenomenon from noise inherent in the data 
collection and analysis. 

Another difficulty in making use of microarray data is due to the fact that the expression 
levels of the genes have a wide range of values or scattered values. Another limitation present in 
some of the Claims amended solves this problem by adjusting the parameters of the plurality of 
genes so that variables related to the parameters are substantially independent or variations of 
scattered values or average associated values of the genes over the sets. The scattered values are 
defined by standard deviation of the associated values in the sets. In the embodiment described 
on pages 1 1 and 12 of the present specification, this is performed by adjusting the value of So in 
equation 1 on page 10 so that the parameter d(i) is substantially independent of the wide 
variations and scattered values or average associated values of the genes, so that all of the 
microarray data can be effectively used. 

The above-described limitations in the Claims as amended are clearly described in the 
paper entitled "Significance analysis of microarrays applied to the ionizing radiation response" 
Virginia Goss Tusher, et al., Proceedings of the National Academy of Sciences of the United 
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States of America (PNAS), (published on line before print, April 17, 2001, 10.1073/pnas. 
091062498), PNAS, April 24, 2001, Volume 98, No. 9, pages 5116-5121. As will be noted, the 
content of this article is essentially captured in the present Application, and the inventors of this 
application are its authors. As referred to in the present Application and in this article, the 
various techniques described in the article are referred to as "SAM," and this article is referred to 
herein as the "SAM article." 

A copy of the SAM article is attached, along with a printout of the references that cite or 
quote this article. Thus, there are roughly around 500 published articles that refer to the SAM 
paper by the inventors as of September of 2005. This makes the article one of the most cited or 
referred to published articles in its field of expertise. Enclosed are two of the articles that refer to 
the SAM article. 

Attached is the article entitled "An expression signature for p53 status in human breast 
cancer predicts mutation status, transcriptional effects, and patient survival," by Lance Miller et 
al., (published on line before print September 2, 2005, 10.1073/pnas. 0506230102), PNAS, 
September 20, 2005, Volume 102, No. 38, pages 13550-13555. As stated on page 6 of this 
article: "Univariate analysis by statistical analysis of microarrays (SAM) (22) identified 6,545 
Affymetrix probe sets representing ~ 5,290 distinct genes whose expression patterns 
distinguished P53 mt and wt tumors with a false discovery rate (q value) < 1% and d score 
(modified t statistic) > 2.0 . . ., further illuminating the extensive nature of the molecular variation 
underlying p53 status." Reference to "(SAM)(22)" in the quote refers to the above SAM 
article. 

SAM software that implements the features described in the present application and the 
SAM article has been widely licensed since 2001. Attached is a Bulletin of information 
available on the internet on licensing such software. 

Attached is a declaration by Dr. Gilbert Chu, one of the inventors of the present 
application, stating that the SAM software that has been licensed to the public implements the 
two claim limitations in the claims as amended discussed above. Dr. Chu further states in his 
declaration that he believes that the analysis using SAM in the above quote from the Miller 
article employs the two claim limitations discussed above through the use of SAM software. 

Thus, as can be seen from the above quote, the authors of the Miller, et al. article use 
features of the two claim limitations in the claims as amended to identify about 5290 distinct 
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genes whose expression patterns distinguished the p53 gene tumors. In other words, by making 
use of the above-described two claim limitations of the Claims as amended, without more, the 
authors of the Miller, et al. article identified around 5290 distinct genes whose associated values 
differ by an amount of statistical significance among the data, for distinguishing the p53 gene 
tumors. As described in this article by Miller, et al., certain correlations that are useful for 
analyzing human breast cancer for predicting mutation status transcriptional effects and patient 
survival are then developed. Thus, this article is direct evidence that to one of ordinary skill in 
the art, the two limitations in the Claims as amended produce a useful, concrete and tangible 
result. This contradicts the Examiner's positions if the rejection is applied to the claims as 
amended. 

Another article citing or referring to the SAM article is "Ancestral antibiotic resistance in 
Mycobacterium tuberculosis," by Rowan P. Morris, et al., (published on line before print August 
15, 2005, 10.1073/pans. 0505446102), PNAS, August 23, 2005, Volume 102, No. 34, pages 
12200-12205. As described on page 5 of this article by Morris et al., monocytes were infected 
by mycobacterium and activated. Labeling of RNA and hybridizations were performed. Then, 
"data from each experimental condition was analyzed separately by using significance analysis 
of microarrays (22) with a false discovery ration < 0.3%." The reference to "significance 
analysis of microarrays (22)" is to the SAM article. Dr. Chu's declaration states that he believes 
that the data analysis using SAM in the above quote from the Morris article uses the two claim 
limitations discussed above through the use of SAM software. 

Thus, in each of the articles by Miller et al. and Morris et al., the two claim limitations in 
the claims as amended discussed above, apparently without more, allow genes whose associated 
values differ by an amount of statistical significance to be identified for very practical, tangible, 
useful and concrete applications, so that the invention in the claims as amended are recognized 
by those skilled in the art to produce a useful, concrete and tangible result. 

The above two articles are merely two of the around 500 articles that refer to or use the 
techniques in the SAM article. More examples of useful, concrete and tangible results produced 
by application of the invention in the Claims as amended can undoubtedly be found in other 
articles in addition to the two described above. We therefore believe that there is ample evidence 
that the invention of the Claims as amended produces a useful, concrete and tangible result, 
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contrary to the opinion of the Examiner. If the Examiner disagrees, it is respectfully requested 
that the Examiner explain in detail the reasons why such rejection is still maintained in the face 
of overwhelming evidence that weighs against such position. 

In responding to the Applicant's arguments, the Office Action stated that "in the instant 
Claims, the Claims as a whole, do not result in the physical transformation and as a whole do not 
constitute a practical application of an abstract idea" quoting State Street Bank & Trust v. 
Signature Financial Group, 47 USPQ 2d at 1600. The Office Action continues: "Thus, data 
transformation is not necessarily a physical transformation, as it is the result as a whole that is 
the focus. It is noted that the Arrhythmia case, for example, 'constituted a practical Application 
of an abstract idea because it corresponded to a useful, concrete, intangible thing - the condition 
of the patient's heart, which is not the case in the instant claimed invention." 

We disagree with the above statement by the Examiner. The various claim elements in 
the rejected claims do not merely manipulate abstract concepts or data, but the associated values 
that are manipulated correspond to useful, concrete and tangible things - the levels of mRNA or 
levels of protein. As further clarified by the claim amendments herein, these levels are measured 
from biological samples containing the genes. These levels, may in turn, indicate certain 
significant biological characteristics or activity. As clearly described in the specification, such 
biological characteristics or activity may, for example, be caused by the effect of radiation on 
genes, by inducing or repressing the genes. The Examiner may again object on the grounds that 
there is inadequate nexus between such practical real world applications and the different claim 
elements in the rejected Claims. We believe, however, that there is no requirement under 35 
U.S.C. § 101 that very specific applications such as gene inducement or repression need to be 
recited in the claims themselves. This is true in the Arrhythmia case as well. As will be noted 
from the Arrhythmia case, the condition of the patient's heart is also not directly recited in the 
Claims at issue. The Claim at issue in the Arrhythmia case merely recites the steps of converting 
QRS signals to time segments, applying a portion of the time segments in reverse time order to 
high pass filter means, determining an arithmetic value of the amplitude of the output of the filter 
and comparing the value with a pre-determined level. The patient's heart condition does not 
appear anywhere in the claim language. The different data that is manipulated by the method in 
Arrhythmia are an indication of the condition of the patient's heart, even though the condition of 
the patient's heart is not directly recited in the Claim. Analogous to the Claim in Arrhythmia, the 
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associated values that are manipulated in the Claims as amended indicate significant biological 
characteristics or activity. When the biological samples analyzed are samples that have been 
irradiated and control samples that have not irradiated, the invention of the claims as amended 
will reveal the genes that have been suppressed and those that have been induced. Analogous 
applications are also shown in the Miller and Morris articles. The claims as amended are no 
different from the Arrhythmia claims where the signals manipulated in the claimed method 
represent heart activity of a patient without actually reciting in the claim the condition of the 
patient's heart. 

We therefore disagree with the Examiner's opinion quoted above. 

UTILITY 

Claims 1-22, 28-30, 33, 44, 46, 58, 60, 65 and 66 remain rejected under 35 U.S.C. 101 
because the claimed invention lacks patentable utility. The rejection is respectfully traversed. 
The discussion above clearly describes many utilities of the invention in the rejected claims. 
Withdrawal of this rejection is respectfully requested. 

In the amendment mailed January 19, 2005, Applicants set forth a number of specific 
Utilities on page 12 of the Amendment. These include the identification of genes whose DNA 
has been damaged by exposure to radiation, the identification of genes in tumors (page 18, line 
24) or the identification of genes whose expression correlates with the survival time of patients 
(page 19, line 16), or with tumor stage (sentence bridging pages 19 and 20). These Utilities were 
rejected by the Examiner in the Office Action mailed April 19, 2005 on the ground that the 
Claims do not recite steps that are applicable to any of these uses and that there is no adequate 
nexus between the disclosed subject matter and these asserted Utilities. We disagree. 

As will be evident from the present Application, as well as the articles described above, 
the utility of the invention of the rejected Claims is achieved by simply applying the steps 
involved to particular biological samples. In the present Application, the recited features were 
applied to samples that were grown with exposure to radiation and samples that were grown 
without exposure to radiation. The invention of the claims as amended then enables the 
identification of genes whose expression has been induced or repressed by radiation. The same 
is true in the case of the above articles involving tumor and breast cancer analysis. This is true 
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also in the case of the Claims in Arrhythmia Research Technology, Inc. v. Cor azonix Corp., 22 
U.S.P.Q. 2d, 1033 at 1038. Claim 1 in the Arrhythmia case recites steps that merely manipulate 
QRS signals which represent signals obtained from the heart, but otherwise do not recite 
anything involving the patient's heart condition. The Court nevertheless deems such claim to 
have utility under 35 U.S.C. 101. In the same manner, it is believed that the Claims as amended 
have utility, without having to recite specific applications, which would unduely limit the claims. 

By rejecting the Claims based on the inadequate nexus between the disclosed subject 
matter and the asserted utilities, the Examiner is in fact requiring Applicants to restrict the 
Claims to particular applications. This is against the rule articulated by the Court of Appeals for 
the Federal Circuit in State Street Bank & Trust v. Signature Financial Group, 47 USPQ 2d 1600 
at 1604. In the ruling by the lower court in this case, the patent was found invalid "because the 
'056 patent is claimed [sic] sufficiently broadly to foreclose any computer-implemented 
accounting method necessary to manage this type of financial structure." In response, the Court 
of Appeals for the Federal Circuit stated as follows: "whether the patent's Claims are too broad 
to be patentable is not to be judged under Section 101, but rather under Sections 102, 103, and 
112. Assuming the above statement to be correct, it has nothing to do with what is claimed is 
statutory subject matter." 

Rejection under 35 U.S.C. 112 First Paragraph 

Claims 1-22, 28-30, 33, 44, 46, 58, 60 and new Claims 65 and 66 are rejected under 35 
U.S.C. 1 12 first paragraph for failing to comply with the written description requirement. 
Specifically, Claims 1, 28, 46, 58 and 60, 64 and 65 are rejected on the ground that it still 
includes the terms "protein." We disagree. As expressly stated in lines 1-5 on page 4 of the 
specification, one of the examples given for of the values associated with genes are the levels of 
protein encoded by the genes. This is also explicitly claimed in Claim 4 of the Claims as 
originally filed. As noted in MPEP 2163, page 2100-173, there is a strong presumption that an 
adequate written description of the claimed invention is present when the Application is filed. 
Further, the PTO has the initial burden of presenting evidence or reasons why persons skilled in 
the art would not recognize in the disclosure a description of the invention defined by the claims. 
The Examiner has failed to present any evidence why persons skilled in the art would not 
recognize a description of the invention originally present both in the Claims and the Summary 
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of the Invention and that the claims still fail to comply with 35 U.S.C. 1 12 for lacking an 
adequate written description. It is respectfully requested that this rejection be withdrawn. 



In view of the amendments and remarks contained herein, it is believed that all claims are 
in condition for allowance and an indication of their allowance is requested. However, if the 
Examiner is aware of any additional matters that should be discussed, a call to the undersigned 
attorney at: (415) 318-1 162 would be appreciated. 



PARSONS HSUE & de RUNTZ LLP 
595 MarketStreet, Suite 1900 
San Francisco, California 94105 
Telephone: 415.318.1160 (main) 
Telephone. 415.318.1162 (direct) 
Fax: 415.693.0194 



CONCLUSION 




Reg. No. 29,545 
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Microarrays can measure the expression of thousands of 
genes to identify changes in expression between different 

biological states. Methods are needed to determine the 

significance of these changes while accounting for the 

enormous number of genes. We describe a method, 

Significance Analysis of Microarrays (SAM), that assigns a 

score to each gene on the basis of change in gene expression relative to the standard 

deviation of repeated measurements. For genes with scores greater than an adjustable 

threshold, SAM uses permutations of the repeated measurements to estimate the 

percentage of genes identified by chance, the false discovery rate (FDR). When the 

transcriptional response of human cells to ionizing radiation was measured by 

microarrays, SAM identified 34 genes that changed at least 1.5-fold with an estimated 

FDR of 12%, compared with FDRs of 60 and 84% by using conventional methods of 

analysis. Of the 34 genes, 19 were involved in cell cycle regulation and 3 in apoptosis. 

Surprisingly, four nucleotide excision repair genes were induced, suggesting that this 

repair pathway for UV-damaged DNA might play a previously unrecognized role in 

repairing DNA damaged by ionizing radiation. 



► Introduction 



DNA microarrays contain oligonucleotide or cDNA probes 
for measuring the expression of thousands of genes in a 

single hybridization experiment. Although massive amounts 

of data are generated, methods are needed to determine 

whether changes in gene expression are experimentally 

significant. Cluster analysis of microarray data can find 

coherent patterns of gene expression (1) but provides little information about 

statistical significance. Methods based on conventional f tests provide the probability 

(P) that a difference in gene expression occurred by chance (2, 3). Although 
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P- 0.01 is significant in the context of experiments designed to evaluate small 
numbers of genes, a microarray experiment for 10,000 genes would identify 100 genes 
by chance. This problem led us to develop a statistical method adapted specifically for 
microarrays, Significance Analysis of Microarrays (SAM). 

SAM identifies genes with statistically significant changes in expression by assimilating 
a set of gene-specific ftests. Each gene is assigned a score on the basis of its change 
in gene expression relative to the standard deviation of repeated measurements for 
that gene. Genes with scores greater than a threshold are deemed potentially 
significant. The percentage of such genes identified by chance is the false discovery 
rate (FDR). To estimate the FDR, nonsense genes are identified by analyzing 
permutations of the measurements. The threshold can be adjusted to identify smaller 
or larger sets of genes, and FDRs are calculated for each set. To demonstrate its 
utility, SAM was used to analyze a biologically important problem: the transcriptional 
response of lympho.blastoid cells to ionizing radiation (IR). 

► Materials and Methods 

Preparation of RNA. Human lymphoblastoid cell lines 
GM14660 and GM08925 (Coriell Cell Repositories, Camden, 

NJ) were seeded at 2.5 * 10 5 cells/ml and exposed to IR 

24 h later. RNA was isolated, labeled, and hybridized to the 

HuGeneFL GeneChip microarray according to manufacturer's 

protocols (Affymetrix, Santa Clara, CA). 

Microarray Hybridization. Each gene in the microarray was represented by 
20 oligonucleotide pairs, each pair consisting of an oligonucleotide perfectly matched 
to the cDNA sequence, and a second oligonucleotide containing a single base 
mismatch. Because gene expression was computed from differences in hybridization to 
the matched and mismatched probes, expression levels were sometimes reported by 
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the GeneChip analysis suite software as negative numbers. 

Northern Blot Hybridization. Total RNA (15 ^ig) was resolved by agarose gel 
electrophoresis, transferred to a nylon membrane, and hybridized to specific 
radiolabeled DNA probes, which were prepared by PCR amplification. 

► Results 

RNA was harvested from wild-type human lymphoblastoid 
cell lines, designated 1 and 2, growing in an unirradiated 

state (U) or in an irradiated state (I) 4 h after exposure to a 

modest dose of 5 Gy of IR. RNA samples were labeled and 

divided into two identical aliquots for independent 

hybridizations, A and B. Thus, data for 6,800 genes on the 

microarray were generated from eight hybridizations (U1A, U1B, U2A, U2B, I1A, II B, 

I2A, and I2B). 

We scaled the data from different hybridizations as follows. A reference data set was 
generated by averaging the expression of each gene over all eight hybridizations. The 
data for each hybridization were compared with the reference data set in a cube root 
scatter plot. We chose the cube root scatter plot because it resolved the vast majority 
of genes that are expressed at low levels and permitted the inclusion of negative levels 
of expression that are sometimes generated by the GeneChip software. A linear least- 
squares fit to the cube root scatter plot was then used to calibrate each hybridization. 

After scaling, a linear scatter plot was generated for average gene expression in the 
four A aliquots (U1A, HA, U2A, and U2A) vs. the average in the four B aliquots (U1B, 
I1B, U2B, and U2B), a partitioning of the data that eliminates biological changes in 
gene expression (Fig. 1/4). The linear scatter plot confirmed that the data were 
generally reproducible but failed to resolve genes expressed at low levels. Better 
resolution of these genes was achieved by the cube root scatter plot (Fig. IB), which 

http://www.pnas.org/cgi/content/full/98/9/5116 9/9/2005 
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revealed three salient features: the large percentage of genes (24%) assigned 
negative levels of expression, the large percentage of genes with low levels of 
expression, and the low signal-to-noise ratio at low levels of expression. 



Fig. 1. Gene expression measured by microarrays. 
{A) Linear scatter plot of gene expression. Each gene 
(/) in the microarray is represented by a point with 
coordinates consisting of average gene expression 
measured from the four A hybridizations (avg x) 

and the average gene expression in the four B 
hybridizations (avg x D ). (B) Cube root scatter plot of 

D 

gene expression. The average gene expression from 
the A and B hybridizations have been plotted on a 
cube root scale to resolve genes expressed at low 
levels. (C) Cube root scatter plot of average gene 
expression from the four hybridizations with 
uninduced cells (avg x^) and induced cells 4 h after 

exposure to 5 Gy of IR (avg xj. Some of the genes 

that responded to IR are indicated by arrows. 

To assess the biological effect of IR, a scatter plot was generated for average gene 
expression in the four irradiated states vs. the four unirradiated states (compare Fig. 1 
B and Q. A few of the potentially significant changes in gene expression are indicated 
by arrows in Fig. 1C, but the effect was not easily quantified, and a method was 
needed to identify changes with statistical confidence. 

Our approach was based on analysis of random fluctuations in the data. In general, 
the signal-to-noise ratio decreased with decreasing gene expression (Fig. 1). However, 
even for a given level of expression, we found that fluctuations were gene specific. To 
account for gene-specific fluctuations, we defined a statistic based on the ratio of 
change in gene expression to standard deviation in the data for that gene. The 
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"relative difference" in gene expression is: 

«(») + so L A J 

where ^(Z) and ?u (/) are defined as the average levels of expression for gene (/) in 

states I and U, respectively. The "gene-specific scatter" s(/) is the standard deviation of 
repeated expression measurements: 



s(i) 



a { D 3 ^) - 3 2 + D - %(o J 2 } 

I n J 



[2] 



where T and r are summations of the expression measurements in states I and U, 
*-m *-n ' 

respectively, a = (1//^ + l//7 2 )/(/7 + /7 2 _ 2), and /7 and are the numbers of 

measurements in states I and U (four in this experiment). 

To compare values of across all genes, the distribution of should be 
independent of the level of gene expression. At low expression levels, variance in dj) 
can be high because of small values of s(/). To ensure that the variance of c(f) is 
independent of gene expression, we added a small positive constant s Q to the 

denominator of Eq. 1. The coefficient of variation of c(/) was computed as a function 
of in moving windows across the data. The value for s Q was chosen to minimize 

the coefficient of variation. For the data in this paper, this computation yielded 
5 Q = 3.3. 

Scatter plots of c{/) vs. s(/) are shown in Fig. 2. The scatter plot for relative difference 
between states I and U is shown in Fig. 2A By contrast, the scatter plot for relative 
difference between cell lines 1 and 2 shows more marked changes in Fig. 2B. These 
relative differences exceeded random fluctuations in the data, as measured by the 
relative difference between hybridizations A and B in Fig. 2C 



Fig. 2. Scatter plots of relative difference in gene 
http://www.pnas.org/cgi/content/full/98/9/51 16 9/9/2005 
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expression vs. gene-specific scatter s(/). The data were 
partitioned to calculate as indicated by the bar codes. 
The shaded and unshaded entries were used for the first and 
second terms in the numerator of in Eq. 1. (A) Relative 
difference between irradiated and unirradiated states. The 
statistic g{/) was computed from expression measurements 
partitioned between irradiated and unirradiated cells. (B) 
Relative difference between cell lines 1 and 2. The statistic d 
(/) was computed from expression measurements partitioned 
between cell lines 1 and 2. (c~) Relative difference between 
hybridizations A and B. The statistic was computed from 
the permutation in which the expression measurements were 
partitioned between the equivalent hybridizations A and 
B. (D) Relative difference for a permutation of the data that 
was balanced between cell lines 1 and 2. 



Although the relative difference computed from hybridizations A and B provided a 

control for random fluctuations, additional controls were needed to assign statistical 

significance to the biological effect of IR. Instead of performing more experiments, 

which are expensive and labor intensive, we generated a large number of controls by 

computing relative differences from permutations of the hybridizations for the four 

irradiated and four unirradiated states. To minimize potentially confounding effects 

from differences between the two cell lines, we analyzed the data by using the 

36 permutations that were balanced for cell lines 1 and 2. Permutations were defined 

as balanced when each group of four experiments contained two experiments from cell 

line 1 and two experiments from cell line 2. Fig. 2 Cand Dare examples of balanced 
permutations. 

To find significant changes in gene expression, genes were ranked by magnitude of 
their values, so that was the largest relative difference, c(2) was the second 
largest relative difference, and was the /th largest relative difference. For each of 
the 36 balanced permutations, relative differences d^i) were also calculated, and the 
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genes were again ranked such that d^i) was the /th largest relative difference for 

permutation p. The expected relative difference, d^i), was defined as the average over 
the 36 balanced permutations, d^i) = Lp d p (/)/36. 

To identify potentially significant changes in expression, we used a scatter plot of the 
observed relative difference c{/) vs. the expected relative difference d^i) (Fig. 3/1). For 

the vast majority of genes, a d^i), but some genes are represented by points 

displaced from the = d^i) line by a distance greater than a threshold A . For 

example, the threshold A = 1.2 illustrated by the broken lines in Fig. 3A yielded 
46 genes that were "called significant." These 46 genes are shown in the context of 
the scatter plot for vs. s(/) (Fig. 3B) and in the scatter plot for the cube root of 
gene expression ^(/) vs. ? (/) (Fig. 3Q. Genes identified by c{/) do not necessarily 

have the largest changes in gene expression. 



Fig. 3. Identification of genes with significant 
changes in expression. (A) Scatter plot of the 
observed relative difference c{/) versus the expected 
relative difference d^i). The solid line indicates the 

line for = d^i), where the observed relative 

difference is identical to the expected relative 
difference. The dotted lines are drawn at a distance 
a = 1.2 from the solid line. {B) Scatter plot of d(i) vs. 
s{i).(Q Cube root scatter plot of average gene 
expression in induced and uninduced cells. The 
cutoffs for 2-fold induction and repression are 
indicated by the dashed lines. In A-Q the 
46 potentially significant genes for a = 1.2 are 
indicated by the squares. 
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To determine the number of falsely significant genes generated by SAM, horizontal 
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cutoffs were defined as the smallest <%/) among the genes called significantly induced 
and the least negative among the genes called significantly repressed. The number 
of falsely significant genes corresponding to each permutation was computed by 
counting the number of genes that exceeded the horizontal cutoffs for induced and 
repressed genes. The estimated number of falsely significant genes was the average of 
the number of genes called significant from all 36 permutations. For A = 1.2, the 
permuted data sets generated an average of 8.4 falsely significant genes, compared 
with 46 genes called significant, yielding an estimated FDR of 18% (Table 1). As 
A decreased, the number of genes called significant by SAM increased but at the cost 
of an increasing FDR. (Omitting s Q from Eq. 1 produced higher FDRs of 45, 35, and 
28% for a = 0.6, 0.9, and 1.2.) 



Table 1. Comparison of methods for identifying changes 
View this table: in gene expression 
fin this window] 
fin a new window! 



Our method for setting thresholds provides asymmetric cutoffs for induced and 

repressed genes. The alternative is the standard t test, which imposes a symmetric 

horizontal cutoff, with c(/) > c for induced genes and c{/) <_ cfor repressed genes. 

However, the asymmetric cutoff is preferred because it allows for the possibility that d 

(/) for induced and repressed genes may behave differently in some biological 
experiments. 

SAM proved to be superior to conventional methods for analyzing microarrays (Table 1 
and Fig. 4/4). First, SAM was compared with the approach of identifying genes as 
significantly changed if an /?-fold change was observed. In this "fold change" method, r 
(') = x^ 7 )/*^')' and 9ene (/) was called significantly changed if /(/) > Ror /(/) < 1/R. 

To permit computation of /(/) from negative values for gene expression, ?I (/) and ?u (/) 
http://www.pnas.org/cgi/content/full/98/9/5116 9/9/2005 
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were converted to 10 when their values were negative or less than 10. The results of 
this procedure yielded unacceptably high FDRs of 73-84%. 
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Fig. 4. Comparison of SAM to conventional 
methods for analyzing microarrays. (A) Falsely 
significant genes plotted against number of genes 
called significant. Of the 57 genes most highly 
ranked by the fold change method, 5 were included 
among the 46 genes most highly ranked by SAM. Of 
the 38 genes most highly ranked by the pairwise fold 
change method, 11 were included among the 
46 genes most highly ranked by SAM. These results 
were consistent with the FDR of SAM compared to 
the FDRs of the fold change and pairwise fold 
change methods. (B) Northern blot validation for 
genes identified by the fold change method. Values 
of /(/) are plotted for genes chosen at random from 
the 57 genes most highly ranked by the fold change 
method. (Q Validation for genes identified by SAM. 
Results are plotted for genes chosen at random from 
the 46 genes most highly ranked by SAM. Genes 
analyzed by Northern blot are represented by circles. 
TNF-a was validated by using a PreDeveloped 
TaqMan assay (PE Biosystems) and is represented by 
a square. The straight lines in £and C indicate the 
position of exact agreement between Northern blot 
and microarray results. 



Another approach attempts to account for uncertainty in the data by identifying genes 
as significantly changed if an /?-fold change is observed consistently between paired 
samples (4). To apply this "pairwise fold change" method to our four data sets before 
IR and four data sets after IR, changes in gene expression were declared significant if 
12 of 16 pairings satisfied the criteria r(i) > Ror /(/) < 1/R. Despite the demand for 
consistent changes between paired samples, this method yielded FDRs of 60-71%. 
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To understand why fold-change methods fail, note that the vast majority of genes are 
expressed at low levels where the signal-to-noise ratio is very low (Fig. 3Q. Thus, 2- 
fold changes in gene expression occur at random for a large number of genes. 
Conversely, for higher levels of expression, smaller changes in gene expression may be 
real, but these changes are rejected by fold-change methods. The pairwise fold- 
change method provides modest improvement but remains inferior to SAM. 

Of the 46 genes most highly ranked by SAM (a = 1.2), 36 increased or decreased at 

least 1.5-fold (R = 1.5). The number of falsely significant genes that met these two 

criteria was 4.5, corresponding to a FDR of 12% (Table 1). Fas was identified three 

times as alternately spliced forms, leaving 34 independent genes (Table 2). As an 

indication of biological validity, 10 of the 34 genes have been reported in the literature 

as part of the transcriptional response to IR. TNF- K was reported to be induced by 

other investigators (5) but was repressed here. Quantitative reverse transcription-PCR 
confirmed this result. 





Table 2. Genes with changes in expression called 


View this table: 


significant by SAM 


Tin this windowl 




[in a new windowl 





To test the validity of SAM directly, we performed Northern blots for genes that were 

randomly selected from the 46 and 57 genes most highly ranked by SAM ( A = 1.2) and 

the fold-change method (at least 3.6-fold change), respectively. Northern blots showed 

little correlation with the genes identified by the fold change method (Fig. 45), but 

strong correlation with the genes identified by SAM (Fig. 4c"). Indeed, Northern blots 

contradicted only 1 (maxiK) of 11 genes identified by SAM, consistent with our 
estimated FDR. 

Nineteen of the 34 genes most highly ranked by SAM appear to be involved in the cell 
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cycle. Three are known to be induced in a p53-dependent manner: p21, cyclln Gl, and 
mdm2 (6-8). Six cell cycle genes were repressed: E2-EPF, p55cdc, cyclin B, ckshs2, 
cdc25, and weel (9, 10). Five genes encoding the mitotic machinery were also 
repressed: PLK-1, MKLP-1, MCAK, C-TAK1, CENP-E (11-13 ). Three genes involved in 
cell proliferation were induced or repressed: PTP(CAAXl), LPAP, and c-myc ( 14-18 ). 
Some responses appeared paradoxical. For example, cdc25 phosphatase and weel 
kinase have antagonistic effects on the phosphorylation state of cdc2, but both genes 
were repressed. Repression of these genes together with the mitotic genes may 
represent a damage response that dismantles the cell cycle machinery until the cell has 
repaired the damaged DNA. 

Four of the 34 genes play roles in DNA repair, but none are involved in the repair of 
IR-induced double-strand breaks. Instead, the genes (p48, XPC, gadd45, PCNA) have 
roles in nucleotide excision repair, a pathway conventionally associated with UV- 
induced damage ( 19-22) . We confirmed the induction of these genes by Northern blot 
(23-25 ). Fornace eta/, reported defective removal of base damage induced by IR in 
xeroderma pigmentosum cells ( 26) . Leadon eta/, reported that a novel DNA repair 
pathway involving long excision repair patches of at least 150 nucleotides is activated 
by IR but not UV (27). Our results suggest that this novel pathway might include p48, 
XPC, gadd45, and PCNA. 

Four of the 34 genes play roles in apoptosis (Fas, bbc3, TNF-a, OX40 ligand). The 
remaining genes may have previously unsuspected roles in the DNA damage response 
or may be among the estimated set of four falsely detected genes. 

The 34 genes most highly ranked by SAM are only a subset of all of the genes that 
change 1.5-fold with IR. Indeed, we calculated the difference between the number of 
genes called significant and the number of falsely significant genes for decreasing 
A = 0.3, 0.2, and 0.1, and found the differences to be 92, 170, and 184, respectively. 
Thus, SAM suggests that approximately 180 of the 6,800 genes on the microarray 
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► Discussion 

SAM is a method for identifying genes on a microarray with 
statistically significant changes in expression, developed in 

the context of an actual biological experiment. SAM was 

successful in analyzing this experiment as well as several 

other experiments with oligonucleotide and cDNA 

microarrays (data not shown). 

In the statistics of multiple testing ( 28-30) , the family-wise error rate (FWER) is the 
probability of at least one false positive over the collection of tests. The Bonferroni 
method, the most basic method for bounding the FWER, assumes independence of the 
different tests. An acceptable FWER could be achieved for our microarray data only if 
the corresponding threshold was set so high that no genes were identified. The step- 
down correction method of Westfall and Young (29), adapted for microarrays by 
Dudoit eta/. ( http://www.stat.berkeley.edu/users/terry/zarray/Html/matt.html) , allows 
for dependent tests but still remains too stringent, yielding no genes from our data. 

Westfall and Young (29) define "weak control" to be control of the FWER when all of 
the null hypotheses are true (i.e., when there are no changes in gene expression). 
"Strong control" is control of the FWER when any subset of the null hypotheses is true. 
Under certain conditions, weak control implies strong control. In fact, the step-down 
correction method exerts both weak and strong control. 

The method of Benjamini and Hochberg (31) assumes independent tests and 
guarantees an upper bound for the FDR (with both weak and strong control) by a 
step-up or step-down procedure applied to the individual lvalues. For our data, the P 
value for each gene is calculated from permutations of the eight experiments. Because 
of the limited number of permutations, the FDR is too "granular", and we identified 
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either zero or 300 significant genes, depending on how the lvalue was defined. A 

similar granular result was obtained for the adaptation to dependent tests by 

Benjamini et al. [ The Control of the False Discovery Rate in Multiple Testing Under 

Dependency (Department of Statistics and Operations Research, Tel Aviv University, 
Tel Aviv), http ://www. math .tau.ac.iI/~ybenja/ 1 . 

SAM does not have strong or weak control of the FWER. Instead, SAM provides an 
estimate of the FDR for each value of the tuning parameter A . The estimated FDR is 
computed from permutations of the data and hence assumes that all null hypotheses 
are true, allowing for the possibility of dependent tests. It seems plausible that this 
estimated FDR approximates the strongly controlled FDR when any subset of null 
hypotheses is true. However, we have not proven this in general. It is possible for SAM 
to give an estimate of the FDR that is greater than 1. However, this has not occurred 
in our experience. Indeed, SAM provides a reasonably accurate estimate for the true 
FDR. To confirm this, we constructed artificial data sets in which a subset of genes was 
induced over a background of noise. SAM successfully identified the induced genes and 
estimated the FDR with reasonable accuracy. 

Although this paper analyzes a simple two-state experiment, SAM can be generalized 
to other types of experiments by defining o{i) in a different way. Suppose the data 
includes gene expression *(/) and a response parameter y^ in which /= 1, 2, ... , m 

genes, j= 1, 2, ... , estates. The generalized statistical parameter still takes the form 
diO ~ KfVi^l) + except that the definitions of /(/) and s(/) change. 

To identify genes with changes in expression in an experiment with three or more 
states, the parameter o(/) is defined in terms of the Fisher's linear discriminant. One 
goal might be to identify genes whose expression in one type of tumor is different from 
its expression in other types of tumors. Suppose that a set of n samples consists of K 
nonoverlapping subsets, such that the response parameter y. e {1, ... , K}. Define C 

7 E 
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(A) = {/:/.= k}. Let n = number of observations in C{k). The average gene 

J * 

expression in each subset is = £ *(/)//7 and the average gene expression for 
all n samples is ^(/) = L xj,f)/n. Then define: ' 

r«) - { [Z k n k /n k n k )Z k n k mi) - ?(i)| 2 }*" [ 3 ] 

= { ( S*(l/nfe)/S fe .(n fc - 1) )Z k L jeG { h) \^) - f [ 4 ] 

SAM can be adapted for still other types of experimental data. For example, to identify 
genes whose expression correlates with survival time, cK^f) is defined in terms of Cox's 
proportional hazards function, in which some of the patients remain alive or are lost to 
follow-up at the time of the study. To identify genes whose expression correlates with 
a quantitative parameter, such as tumor stage, can be defined in terms of the 
Pearson correlation coefficient. Another example includes the definition of for 
paired data, such as gene expression in tumors before and after chemotherapy. In 
each case, the FDR is estimated by random permutation of the data for gene 
expression among the different experimental arms, i.e., permutations among the n 
arms of Thus, SAM is a robust and straightforward method that can be adapted to a 

broad range of experimental situations. SAM and the adaptations discussed above are 
available for use at http:// www-stat-class.stanford.edu/SAM/SAMServlet . 
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SAM: Significance Analysis of 
Microarrays 

Supervised learning software 
for genomic expression data mining 

News . 

mst New release 2.20, Oct 4, 2005. SAM now provides sample size assessment- 
estimates of FDR, FNR, type I error and power for different sample sizes. 
"A simple method for assessing sample sizes in microarray experiments" (pdf) . 

>m Major New Release: Version 2.0. June 6, 2005. Now version 2.11 — Aug 24, 
2005. All users should upgrade to this version. SAM now handles time course data, 
does non-parametric tests and pattern discovery, It also reports local false discovery 
rates and miss rates. 

A discussion and annoucement group for all SAM-related discussions and 
announcements has been created. See http://groups.yahoo.com/group/sam-software . 

Features 

• Developed at Stanford University Labs: based on recent paper of 
Tusher, Tibshirani and Chu (2001): 

"Significance analysis of microarrays applied to the ionizing radiation 
response" (ps file), (pdf version). PNAS 2001 98: 5116-5121, (Apr 24). 
"Raw data" 

• Correlates gene expression data to a wide variety of clinical parameters 
including 

treatment, diagnosis categories, survival time and time trends 

• Provides estimate of False Discovery Rate for multiple testing 

• Convenient Excel Add-in 

• Works with data from both cDNA and oligo microarrays. Can also be 
applied to protein expression data and SNP chip data. 



• Patent Pending for SAM technology 
http :// ww w-stat. sta nford . ed u/~ti bs/S AM/ 
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• SAM uses the FDR and q-value method presented in Storey (2002) A 
direct approach to false discovery rates. J. Roy. Stat. Soc. Ser. B, 
64:479-498; 

Local false discovery rates proposed in Efron, B., Tibshirani, R., Storey, 
JD, and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray 
Experiment, JASA, 96, 1151-1160 and Efron and Tibshirani, 
Microarrays, Empirical Bayes Methods, and False Discovery Rates" 
Genet. Epidemiol. 2002 Jun;23(l):70-86; 

and Miss rates— Jon Taylor, Rob Tibshirani and Brad Efron. The x "Miss 
rate" for the analysis of gene expression data; Biostatistics 2005 6 
(1):111-117. 

• List of features 



• R package samr 

• Sample screens 

• Frequently Asked Questions 

• Samster" tool 

• Euan Ashley's heatmap builder 

• Related links: PAM package for microarray classification: 
CGH-Miner package for CGH data: 

PPC package for protein mass spec classification 
Superpc package for microarray prediction; 

Obtaining SAM 

• Academic users can download SAM by going directly to the registration 
page . Please note that this is the full version! 

• Non academic users should first register via the registration page . An 
evaluation version (limited to 500 genes) can be downloaded directly 
from that page. 

If you are a commercial user and wish to obtain a complete version of 
SAM, proceed to the SAM resource at the Office of Technology and 
Licensing . The SAM contact is Sara Nakashima 
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(sara . nakashima@stanf ord . edu) at the Office of Technology and Licensing, 
Phone: (650) 725-9117. 

Please do not contact Sara Nakashima about downloading, 
technical questions etc. All she handles is commercial licensing! 

• Returning users (those who have already registered) who want to 
download the software again can proceed directly to the Academic 
Download Page or the Non-Academic Download Page . You will need the 
registration information that you received via email. 
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Perturbations of the p53 pathway are associated with more aggressive 
and therapeutically refractory tumors. However, molecular assessment 
of p53 status, by using sequence analysis and immunohistochemistry, 
are incomplete assessors of p53 functional effects. We posited that the 
transcriptional fingerprint is a more definitive downstream indicator of p53 function. 
Herein, we analyzed transcript profiles of 251 p53-sequenced primary breast tumors 
and identified a clinically embedded 32-gene expression signature that distinguishes 
p53-mutant and wild-type tumors of different histologies and outperforms sequence- 
based assessments of p53 in predicting prognosis and therapeutic response. Moreover, 
the p53 signature identified a subset of aggressive tumors absent of sequence 
mutations in p53 yet exhibiting expression characteristics consistent with p53 
deficiency because of attenuated p53 transcript levels. Our results show the primary 
importance of p53 functional status in predicting clinical breast cancer behavior. 



microarray | expression analysis | tumor profiling | class prediction 



The p53 tumor suppressor is a critical regulator of tissue homeostasis, and its 
inactivation at the gene or protein level confers cellular properties conducive for 
oncogenesis and cancer progression. Mutations in p53 occur in >50% of human 
cancers (1, 2), and the mutational status of p53 is prognostic in many malignancies 
(3). In breast cancer, p53 mutations are associated with worse overall and disease- 
free survival, independent of other risk factors (4), and have been implicated in 
resistance to anticancer therapies (5-11). These observations, however, have been 

http://www.pnas.org/cgi/content/full/102/38/13550 9/29/2005 
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inconsistent (12, 13), owing, in part, to the variable accuracy of the methods to 
ascertain p53 status, variation in disease severity attributable to the different forms of 
p53 mutation, and studies of insufficient size (8, 11, 14). Further confounding the 
association between p53 status and patient risk is the growing number of alternative 
molecular mechanisms (e.g., MDM2) that compromise p53 function. 

In this study, we explored the possibility that a gene-expression signature, derived 
from differences between p53 mutant (mt) and wild-type (wt) breast tumors, could 
provide a more accurate measure of the functional configuration of p53, thereby 
improving its prognostic utility. Using oligonucleotide microarrays covering >30,000 
genes, we analyzed the global transcript levels of 251 primary invasive breast tumors 
for which we have detailed information on p53 status, as determined by cDNA 
sequencing (6) and pursued a validation strategy of intersecting alternative array data 
sets. We found that, in most cases, tumors with mt and wt p53 can readily be 
distinguished by their expression profiles and that a 32-gene p53 signature is 
consistently associated with patient survival in different patient subsets, independent 
of other risk factors, and is a superior prognostic and predictive indicator, compared 
with p53 mutation status alone. 

► Methods 

Patients and Specimens. Frozen tissue was collected from 315 
consecutively presented primary breast cancers representing 65% of 
all those resected in Uppsala County, Sweden, from January 1, 1987 to 
December 31, 1989 (6). Of these tissues, 251 were comprised 
predominantly of diseased tissue, were sequenced for p53 (6), and yielded sufficient 
RNA for array analysis. Clinicopathological variables measured at diagnosis were 
obtained from patient records and are described in detail in Supporting Materials and 
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Methods, which is published as supporting information on the PNAS web site. This 
microarray study was approved by the ethical committee at the Karolinska Institute, 
Stockholm, Sweden. 

Expression Profiling. Total RNA was extracted from samples by using RIMEasy Mini 
kit (Qiagen, Hilden, Germany) and evaluated on a 2100 Bioanalyzer (Agilent 
Technologies). In vitro transcription products were prepared from 2-5 ng of total RNA, 
hybridized to the Affymetrix U133 A and B arrays and washed and scanned according 
to the manufacturer's instructions. 

Microarray Data Processing. Raw data were normalized by using the global mean 
method. Probe-set signal values were natural log transformed and scaled by adjusting 
the mean intensity to a target signal value of log 500. Samples with suboptimal 
average signal intensities (i.e., scaling factors >3.5) or GAPDH 3'/5' ratios >3.5 were 
relabeled and rehybridized on new arrays. If visible artifacts were observed, the same 
cRNA was rehybridized on new chips. 

Class Prediction. For gene selection, we fit a linear model to the expression data 
with expression level as the response and p53 status, estrogen-receptor (ER) status, 
and grade status as the predictor variables. As an initial filter, we excluded genes with 
a lvalue for model fit >0.001 and ranked genes in decreasing order of the absolute 
value of the p53 status coefficient. For class prediction, we evaluated several 
supervised learning methods, including diagonal linear discriminant analysis (15), k 
nearest neighbors (16), and support vector machines (17), as described in Supporting 
Materials and Methods. 

Data Analysis. For all hierarchical cluster analyses, log expression values of each 
gene were mean centered, and genes and tumors were clustered by using Pearson 
correlation and average linkage (cluster and treeview software, 
http://rana.lbl.gov/EisenSoftware.htm ). 
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The Kaplan-Meier estimate was used to compute survival curves, and the lvalue of 
the likelihood-ratio test was used to assess statistical significance of the hazard ratios. 
All patients with contralateral or bilateral cancers were omitted, and patients who died 
of their cancer 10 years after diagnosis were systematically censored. 

For association tests, the x test was used, unless the number of events was <5 in any 
category, in which case Fisher's exact test was used. 

Cox regression was used to confirm the prognostic significance of the p53 classifier in 
multivariate analyses. The initial model, comprising all conventional predictors, and 
p53 mutation status and the p53 signature as competing measures of p53 activity, was 
simplified by using a stepwise model-selection procedure based on the Akaike 
information criterion. Remaining predictors were assessed by likelihood-ratio test. 

Independent Datasets. The S0rlie eta/. ( 18) and Chen eta/. (19) data and clinical 
annotations were obtained from the Stanford microarray database by using filtering 
parameters as described by the authors. The Ma eta/. (20) "whole tumor" data set 
was downloaded from the Gene Expression Omnibus with accession no. GSE1379 
riMCBI GEO] , and each array was mean centered. The van't Veer eta/. (21) data and 
survival annotation were accessed through the Rosetta Inpharmatics publications 
archive. All image clone IDs or GenBank accession nos. of array probes were mapped to 
UniGene build no. 167. 



► Results 



P53 Mutant and WT Tumors Are Molecularly Distinct. Transcript 
profiles of 251 primary breast tumors were assessed by using 
Affymetrix U133 oligonucleotide microarrays. Previously, cDNA 
sequence analysis revealed that 58 of these tumors had p53 mutations 
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resulting in protein-level changes, whereas the remaining 193 tumors were p53 wt (6). 
By unsupervised hierarchical cluster analysis, we found that p53 mt and wt tumors are 
distinguished by pervasive molecular differences. With the top 2,000 most variably 
expressed genes (selected independent of p53 status), >80% of the p53 mt tumors 

-13 

clustered into one branch and >70% of the p53 wts into the other (P= 5.6 x 10 ; 
see Fig. 5, which is published as supporting information on the PNAS web site). 

-12 

Importantly, this separation remained highly significant (P< 2 x 10 ) across a range 
of gene panels from the top 5,000 genes with highest variance to the top 125 (see 
Table 1, which is published as supporting information on the PNAS web site). This 
separation was most heavily influenced by three predominant gene clusters comprising 
genes involved in immune response, proliferation, and estrogen response (Fig. 5). 
Univariate analysis by statistical analysis of microarrays (SAM) (22) identified 6,545 
Affymetrix probe sets representing «5,290 distinct genes whose expression patterns 
distinguished p53 mt and wt tumors with a false discovery rate (rvalue) <1% and d 
score (modified t statistic) >2.0 (see Table 2, which is published as supporting 
information on the PNAS web site), further illuminating the extensive nature of the 
molecular variation underlying p53 status. Topping the list of genes most highly 
expressed in p53 mt tumors were those with roles in cell cycle and proliferation, 
consistent with the observation that wt p53 has a negative regulatory effect on cell- 
cycle genes. The genes more highly expressed in the p53 wt tumors included 
uncharacterized genes, signaling molecules and transcription factors, transcriptional 
targets of p53, and estrogen-inducible genes. 

The p53 status was also correlated with two other clinical parameters, ER status and 
tumor grade (Fig. 5). Within the p53 mt-rich cluster, we observed 89% of ER-negative 
tumors (/>= 1.9 x 10" 10 ), 79% of grade III tumors (/>= 3.8 x 10" 11 ), and only 14% of 
grade I tumors {P= 2.5 x 10" 7 ). The finding that p53 mutant tumors are correlated 
with ER negativity and grade III status is consistent with previous reports that p53 
mutations associate with ER negativity and high tumor grade (23). 
http://www.pnas.org/cgi/content/full/102/38/13550 9/29/2005 
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A Gene Expression Classifier Predicts p53 Status in Independent Breast and 
Liver Cancer Data Sets. We considered the possibility that the differential expression 
observed between p53 mt and wt tumors might, to some extent, reflect changes in the 
operational configuration of the p53 pathway. We reasoned that some p53 wt tumors 
would be p53 deficient through mechanisms other than p53 mutation, such as MDM2 
amplification or pl4/ARF deletion and, thus, possess expression profiles more akin to 
p53 mt tumors with dysfunctional p53. To explore this possibility, we fitted a 
multivariate linear regression model (i.e., linear modelfit) (24) that allowed us to rank 
genes by their correlation with p53 status, while controlling for histologic grade and ER 
status. As a result, many cell-cycle genes correlated with p53 status by univariate 
analysis were no longer well associated (see Fig. 6, which is published as supporting 
information on the PNAS web site), suggesting that the transcriptional profiles of most 
cell-cycle genes are more related to histologic grade than to p53 status. 

For class discrimination, we evaluated several linear learning methods including: 
diagonal linear discriminant analysis (DLDA) (15), /r-nearest neighbors (ANN) (16), and 
support vector machines (SVM) (17) . In each case, the optimal gene classifier was 
obtained by leave-one-out cross validation, where the linear model-fit procedure was 
iteratively applied to all samples minus the left-out sample. The resulting prediction 
accuracies were highly similar, ranging from 84.9% to 85.7% (see Supporting 
Materials and Methods). Interestingly, 20 tumors were consistently "misclassified" by 
all three methods (8 wt and 12 mt), indicating a surprising degree of concordance 
among misclassified tumors. DLDA showed the highest sensitivity for detecting p53 
mutants (i.e., 79% sensitivity compared with 53% for both /NN and SVM) and was 
therefore selected for further analysis. By DLDA, the optimal classifier was comprised 
of 32 genes, whereby 26 of the wt tumors were misclassified as mutant-like, and 12 
mutants were misclassified as wt-like (Fig. 1A ). 

To evaluate the performance of the classifier genes (referred to hereafter as the p53 
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signature genes) as a clinical discriminator of p53 status, we accessed two publicly 
available cDNA microarray data sets where p53 mutational status was known: a breast 
cancer study by Sarlie eta/. (18) and a liver cancer study by Chen eta/. (19). In the S0 
rlie data set, 69 breast tumors had been sequenced for p53 mutations. Of our p53 
signature genes, 28 mapped to established UniGene IDs, and more than half of these 
28 genes were represented on the S0rlie eta/, microarray. However, only nine were 
found to correspond to cDNA probes having expression measurements present in 
>50% of tumors, where the tumors possessed measurements for >50% of genes 
(resulting in a subset of 44 well sampled tumors). Because the classification rules could 
not be directly applied, we used this 9-gene subset of the p53 signature to 
hierarchically cluster the tumors in an unsupervised manner. Fig. IB shows a 
significant separation of p53 mt and wt tumors: 77% of mutants clustered into one 
branch, and 77% of wts clustered into the other {P = 0.0003). By Monte Carlo 
simulations, we estimated the probability that a randomly selected nine-gene subset 
could cluster the samples with equivalent or better significance was P = 0.008, thus 
reaffirming the robust discriminative power of the p53 signature genes. 

In the Chen etal. liver cancer data set (38), p53 protein levels had been ascertained 
by immunohistochemistry (IHC). Eight of our signature genes could be mapped to all 
59 tumors assayed for p53, with each gene having data present in >90% of all tumors 
and where each tumor contained data for >50% of the genes. We observed that even 
this eight-gene subset was able to cluster the liver cancers into two primary clusters 
significantly correlated with p53 levels: 87% of the IHC-positive (predicted mts) in one 
cluster, and 61% of the predicted wts in the other (P = 0.00035) (Fig. 1Q . Again, the 
probability of this clustering occurring by random chance was P - 0.009 by Monte 
Carlo P value estimation. Taken together, these observations suggest that the genes 
comprising the p53 signature are robust in their ability to classify not only breast 
tumors but also liver cancers according to their p53 mutational status and, therefore, 
may have generalizable utility in predicting p53 status in a range of cancer types. 
http://www.pnas.org/cgi/content/full/102/38/13550 9/29/2005 
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Fig. 1. The p53 signature is associated with p53 
status in independent data sets. Clustergrams are 
oriented as outlined in Fig. 5. (A) Expression profiles 
of the Uppsala tumors segregated by the 32-gene 
signature. Unigene symbols and GenBank IDs are 
listed to the right. (B) P53 mt and wt breast tumors 
from S0rlie eta/. (18) were clustered by using a nine- 
gene subset of the p53 signature. (Q P53 mt and wt 
liver tumors (predicted by immunohistochemistry) 
from Chen eta/. (19) were clustered by using an 
eight-gene subset of the p53 signature. Green 
dendrogram branches denote tumors with the wt-like 
configuration; red branches indicate those with mt- 
like profiles. Probe IMAGE clone IDs from the original 
studies are listed. Black bars denote mt p53 status. 

Transcript Analysis of p53 Pathway Genes Corroborates Tumor 
Classifications. We hypothesized that the p53 expression signature may better 
reflect the relative intactness of p53 function in the tumor than sequence mutation 
status alone, implying that p53 sequence-wt tumors "misclassified" as mt-like may, in 
fact, be p53 deficient by other means. First, we considered the possibility that p53 
deficiency could result from reduced p53 transcript levels. We compared the transcript 
levels of p53 among the different tumor classes ( Fig. 2) . We observed that the overall 
expression level of p53 was significantly reduced in the 26 wt tumors with mt-like 
signatures (referred to henceforth as the "26 mt-like" tumors), compared with the 

-4 

remaining 167 wt tumors classified as wt-like (P= 1.8 x 10 ), strongly suggesting that 
reduced p53 transcripts can result in biological consequences in vivo. 

Fig. 2. Transcript levels of p53 and its transcriptional 
targets are consistent with classification results. 
Expression levels of p53-pathway-relevant genes 
View larger version were examined in different tumor subgroups. The 




View larger version 

(93K): 
fin this window] 
fin a new window ! 
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(37K): four tumor subgroups are defined as follows: (/) p53 

rin this window] mt tumors classified as mt-like (n = 46), (//) p53 wt 
fin a new window] tumors classified as mt-like (n = 26), (///) p53 wt 

tumors classified as wt-like {n = 167), and (/V) p53 
mt tumors classified as wt-like (n = 12). Differences 
in transcript levels were determine by t test and are 
shown in a summary table to the right; lvalues 
>0.05 are shown in gray. 

We further hypothesized that known transcriptional targets of p53 would show altered 
transcription in p53-deficient tumors. Indeed, a number of p53 target genes 
demonstrated expression patterns consistent with a mutant p53 status ( Fig. 2) . The 
7P5>inducible genes TP53INP1, SEMA3B, PMAIP1 (NOXA), FDXR, CCNG1, and LRDD, 
which all contain functional p53-binding sites in their promoters, showed significantly 
lower expression in the 26 mt-like tumors, compared with the other wt (all at P< 
0.05). In a consistent manner, all but one of these genes were also significantly 
reduced in the p53 mt tumors, compared with all wt tumors. Furthermore, in all but 
two cases, these genes showed significantly higher expression in the set of 12 
sequence-mt tumors classified as wt-like when compared with the other mts, 
suggesting that the p53 mutations in these 12 tumors may have a more benign effect, 
with respect to p53 functionality. CHEK1 and CHEK2 are both upstream effectors of 
p53 function known to be transcriptionally repressed by p53. Significantly, their mRNA 
levels were elevated in both the p53 mt and p53 mt-like classes. Again, the 12 mts 
classified as wt-like showed a reversed pattern, i.e., displaying significantly lower 
expression of these genes, compared with the other 46 p53 mutants. Together, these 
observations suggest that the "misclassified" tumors more correctly reflect the 
active/inactive status of the p53 pathway and are consistent with the notion that 
reduced p53 levels in breast tumors result in downstream transcriptional changes 
similar to those found in p53 mutations. 

Of note, the canonical marker of p53 activity, CDKN1A (p21/WAFl), was only 
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moderately higher in p53 wt tumors, compared with those with sequence mutations (P 
= 0.02), and not significantly lower in the 26 mt-like tumors, compared with the other 
wts (P= 0.09; data not shown). Furthermore, the known p53-inducible genes PERP, 
BAX, and SFN (14-3-3 sigma) were, paradoxically, all expressed at higher levels in the 
p53 mutants and the 26 mt-like tumors rather than the expected lower levels (Fig. 2). 
These observations may reflect cross-talk among different transcriptional regulators in 
the consensus of primary tissues, as compared with dynamic changes in single cell 
lines. For example, the p53 target genes p21 and BAX are also directly regulated by 
the breast cancer oncogene, c-Myq in a manner independent of, and antagonistic to, 
p53 (25, 26). The regulation of p53 target genes by alternative transcriptional 
modifiers acting independently of p53 or in the context of p53 deficiency (e.g., PERP, 
BAX, and SFN) may have implications for p53 tumor-suppressor activity. 

We next asked whether the mutational spectrum of p53 in our tumors could explain 
the different functional consequences, as measured by the expression profiles. Of the 
46 p53 mt tumors correctly classified as mts, 43% (20 of 46) possessed "severe" 
mutations, defined as insertions {n = 2), deletions (n = 11), and stop codons (n = 7) 
resulting in frame shifts and truncations, whereas in the 12 p53 mutants classified as 
wt-like by the expression signature, only 1 contained a severe mutation, a 3-bp 
insertion in the DNA-binding domain, resulting in the in-frame addition of a glycine 
residue. Notably, this difference was statistically significant at P = 0.02. Using the IARC 
TP53 mutation database (ITMD) (27), we cross-compared the missense point 
mutations (mpms) in each tumor group with the ITMD's index of 418 mutants 
previously analyzed for dominant-negative function. Only 1 of the 11 mpms among the 
12 wt-like mutants had been demonstrated previously to have dominant-negative 
activity, compared with 12 of 27 within the mt-like group {P= 0.039). Together, these 
data suggest that, at the sequence level, the 12 p53 mutants classified as wt-like may, 
in fact, represent p53 mutant forms that have less biological effect. 
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The p53 Signature Predicts Outcome Better Than p53 Mutation Status Alone. 

We next asked whether the p53 signature could predict disease-specific survival in the 
patients of the Uppsala cohort. The classifier separated patients into low and high risk 
groups with a much higher statistical significance than the sequence-based p53 status 
alone (P = 0.0006 versus P = 0.01, respectively) ( Fig. 3 A and B ). More interestingly, 
when the classifier was tested on the subset of women with wt p53 by sequence, we 
again observed a significant separation of patients by survival (P= 0.02; Fig. 30 , 
indicating that women with p5 3 sequence- wt tumors, yet exhibiting the mt-like 
expression signature, have a greater likelihood of dying from their cancer. Fi g. 3D 
shows that the survival curve for this tumor type is highly similar to that of p53 mt 
tumors classified as mt-like (blue and green curves, respectively), whereas the 12 
individuals with p53 mt tumors classified as wt-like do not have significantly unique 
outcomes. 

Fig. 3. The p53 classifier has greater prognostic 
significance than p53 mutation status alone. Kaplan-Meier 
survival plots for disease-specific survival are shown for 
patients classified according to p53 mutation status (A and 
E), the p53 classifier (B, C, and F), or both (D). All 
patients were assessed in A, B, and D. Only the patients 

with p53 wt tumors were assessed in C Sixty-seven ER + , 
hormone-treated (TAM) patients were assessed in E and 
View larger version F. 
(41K): 
[m this window] 
fin a new window] 

To further test the clinical utility of the p53 signature, we analyzed its prognostic 
performance on therapy-specific treatment groups. In a subpopulation of the Uppsala 
cohort consisting of 67 ER + patients who received only adjuvant hormonal therapy 
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after surgery, the signature was a significant predictor of disease-specific survival {P - 
0.05), whereas p53 mutation status alone was not (P= 0.4) ( Fig. 3 £and F ). 
Importantly, by multivariate Cox regression analysis, the p53 classifier remained 
significantly associated with survival in the hormone-treated group {P = 0.02), the 
complete cohort (P= 0.02), and the p53 wt group (P= 0.002), even when controlling 
for the classical predictors (ER and progesterone receptor) and prognostic factors 
(lymph node status, Elston grade, tumor size, and patient age), whereas the p53 
mutation status, as determined by sequencing, did not. This demonstrates that the 
expression classifier is more directly prognostic of patient survival than is p53 mutation 
status alone. 

The p53 Signature Predicts Outcome in Independent Therapy-Specific Data 
Sets. We next assessed the prognostic capability of the p53 signature genes in 
therapy-specific cohorts by using independent microarray data sets from the public 
domain ( Fig. 4 ; and see Fig. 7, which is published as supporting information on the 
PNAS web site). First, we evaluated whether the signature genes were prognostic of 
tumor recurrence in the Ma eta/. (20) data set of 60 breast tumors derived from 
patients treated with postoperative radiation and adjuvant tamoxifen monotherapy. In 
this cohort, patients with and without recurrent disease were matched with respect to 
tumor grade and tumor node metastasis stage. Twenty-two of the p53 signature genes 
mapped to 27 probes on the Ma eta/, spotted oligonucleotide array. Hierarchical 
cluster analysis with these genes revealed two to three primary tumor clusters with 
expression profiles that resembled the mt-like and wt-like configurations ( Fig. 4/4 ). 
Using these tumor clusters to define patient survival groups, we analyzed disease-free 
survival (DFS) by the Kaplan-Meier estimate. As shown in Fig. 4 17 and C , the clusters 
were significantly associated with tumor recurrence [P= 0.01 (two clusters, CI and 
C2) and P- 0.005 (three clusters, CI, C2, and C3)]. Thus, concordant results in two 
independent studies suggest that functional p53 deficiency, as assessed by an 
expression readout, is predictive of outcome to hormonal therapy. 
http://www.pnas.org/cgi/content/full/102/38/13550 9/29/2005 
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Fig. 4. The p53 signature predicts survival in independent 
clinically diverse data sets. (A) Tumor dendrogram from 
clustering 60 tumors and 22 genes (27 probes) from Ma eta/. 
(20). {Band Q Patient subgroups determined by the primary 
tumor branches (C1-C4) were analyzed for correlations with 
DFS. (D) Tumor dendrogram from clustering 76 tumors and 9 
genes from Sorlie eta/. (18). Patient subgroups defined by 
the primary tumor branches (CI and C2) were analyzed for 
correlations with disease-specific survival (DSS) (£) and DFS 
(F). (<S) Tumor dendrogram from clustering 97 tumors and 21 
genes (25 probes) from van't Veer eta/. (21). (//and I) 
Primary tumor clusters (C1-C5) defined patient subgroups for 
DFS analysis. Red branches denote tumors with the p53 mt- 
like signature; black branches identify those with the wt-like 
signature. Black triangles indicate patients who relapsed 
within 5 years. See Fig. 7 for gene heat maps and probe IDs. 

To examine the prognostic performance of the p53 signature genes in patients treated 
with systemic chemotherapy, we used the Sarlie eta/. cDNA microarray data set. The 
majority of patients (>80%) in the S0rlie study received weekly doxorubicin or 5FU and 
mitomycin and were comprised mostly of late-stage patients (10, 11). Here, the nine- 
gene partial signature that could distinguish mt and wt tumors with 77% accuracy, 
was used to hierarchically cluster 76 well sampled tumors with corresponding 
treatment and survival data ( Fig. 4Z3) . Again, we observed the tumors cluster into two 
primary branches with expression patterns characteristic of the wt-like and mt-like 
configurations. Survival analysis resulted in a highly significant difference in outcome 
between patients with mt-like and wt-like tumors [P = 7.5 x 10" 5 (disease-specific 
survival) and P- 5.0 x 10" 5 (DFS)]; Fig. 4 Fand F ) despite the small number of genes 
used. Notably, Fig. 4£ predicts a remarkable 5-year 90% survival rate for the 31 p53 
wt-like patients, compared with a 35% probability of 5-year survival for the 44 p53 mt- 
like patients. 
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Next, we tested the performance of the signature genes on a set of 97 early stage 
tumors (T1/T2, NO), from patients <55 years of age at diagnosis and treated by 
radiotherapy alone (21). From our 32-gene signature, we were able to map 25 probes 
corresponding to 21 signature genes to all 97 tumors with outcome information. 
Unsupervised clustering revealed two primary and four secondary tumor clusters ( Fig. 
4G) that could significantly discern patients based on time to distant metastasis within 
a 5-year period [ Fig. 4 Hand I ; P- 0.0006 (two clusters, CI and C2) and P = 0.001 
(four clusters, CI, C2, C3, and C4)]. Notably, of the 24 tumors in cluster 1 (CI) that 
bear the molecular configuration of p53 mt-like tumors, 75% belonged to patients who 
developed a distant metastasis within 5 years, compared with 26% of 34 patients with 
tumors comprising C4 (which most closely resemble the p53 wt-like signature). These 
findings indicate that the p53 signature is also prognostic of recurrence in early stage, 
locally treated breast cancer. 

The p53 Signature Genes Are Not Canonical p53 Targets. To gain some 
mechanistic insights, we examined the functional annotations of the signature genes 
for clues to explain their correlations with p53 status and patient outcome. We found 
that none of the signature genes are known transcriptional targets of p53, nor have 
they been previously implicated in the p53 pathway. Moreover, promoter analysis 
revealed no evidence of p53-binding sites. Of the characterized genes, a number are 
associated with cell growth and proliferation {MYBL2, TFF1, BRRN1, CHAD, SCGB3A1, 
DACH, and CDCA8), transcription ^Fi NY-BR-1, DACH, and MYBL2), ion transport 
{CACNG4, CY-BRD1, and LRP2), and breast cancer biology {SCGB3A1, TFF1, STC2, NY- 
BR-1, and AGR2). Interestingly, MYBL2, which was transcriptionally up-regulated in the 
p53 mt-like tumors, is a growth -promoting transcription factor structurally related to 
the c-MYB oncogene. MYBL2 maps to a chromosomal region frequently amplified in 
breast cancer (20ql3) and has previously been reported to be overexpressed in breast 
cancer cell lines and sporadic ovarian carcinomas (28, 29). SCGB3A1 {HIN1), which we 
observed to be down-regulated in the p53 mt-like tumors, is a putative tumor- 
http://www.pnas.org/cgi/content/full/102/38/13550 9/29/2005 
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suppressor gene that can inhibit breast cancer cell growth when overexpressed and 
has been found to be transcriptionally silenced by promoter hypermethylation in early 
stages of breast tumorigenesis (30). Thus, some of the p53 signature genes may 
contribute mechanistically to the poor prognosis associated with the p53 mt-like 
tumors. 
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Breast cancers are characterized by multiple genetic alterations that, 
together, comprise the genotype that dictates tumor behavior. It is 
therefore reasonable that the compilation of genetic changes is a 
better indicator of clinical behavior than a single gene. Herein, we 
show that an expression signature, deduced from differences in the molecular 
configurations of p53 wt and mt tumors, predicts for p53 functional inactivation in 
primary breast cancers and provides a more accurate and useful measure of p53 
clinical functionality than p53 mutation status alone. We show that, in independent 
data sets of both breast and liver cancers and regardless of other clinical features, 
subsets of the p53 signature can predict p53 status with significant accuracy. As a 
predictor of disease-specific survival, we found that the signature significantly 
outperformed p53 mutation status in a large patient cohort with heterogeneous 
treatment. Importantly, the p53 signature could significantly distinguish patients 
having more or less benefit from specific systemic adjuvant therapies and locoregional 
radiotherapy. Recently, Ma eta/, identified by microarray analysis two genes (HOXB13 
and IL17RB) whose expression ratio was predictive of tamoxifen response. Notably, we 
found that these genes were also predictive of disease-specific survival in the 67 
Uppsala patients treated with tamoxifen monotherapy (P< 0.01; data not shown). 
However, these genes were not prognostic of recurrence in the van't Veer data set, 
nor were the van't Veer 70 genes prognostic of recurrence in the Ma data set (20), 
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suggesting that tumor stage and/or therapeutic context is an important determinant of 
the prognostic capacity of some genes. In contrast, we demonstrate that the p53 
signature genes are robustly prognostic of survival and recurrence in both early and 
late stage disease and in different therapeutic settings. 

Although the p53 pathway may be compromised at some level in most human cancers, 
our analysis of genes involved in the p53 pathway suggests that the p53 expression 
signature defines some operational configuration of this pathway in breast tumors 
(more so than p53 mutation status alone) that impacts patient survival and therapeutic 
response. Recent evidence suggests that tumor sensitivity to some anti-cancer agents 
may depend largely on the relative intactness of p53-dependent mechanisms of 
apoptosis (7, 8, 10, 11) and that taxols (microtubule stabilizers), in particular, may 
have greater efficacy against p53-mt breast tumors than anthracycline-based 
(genotoxic) compounds (9). Whether the p53 classifier genes identified here are 
involved in some aspect of this p53 function or will have robust clinically utility as a 
predictor of therapeutic response warrants further investigation. 

Other studies have elucidated gene expression signatures prognostic of breast cancer 
outcomes (21, 31). Although a 21-gene subset of our p53 signature could significantly 
distinguish patients with recurrent and nonrecurrent disease in the van't Veer study 
(21) , none of these genes were found to overlap with the 231 genes identified as 
prognostic discriminators in the van't Veer set; and only one of the classifier genes, 
MYBL2, was found in the Sotiriou 485 survival-correlated genes (31). Similarly, none of 
the p53 signature genes were found in the top 25 relapse-associated genes reported 
by Ma eta/. Thus, the p53 signature genes identified here represent a previously 
unrecognized prognostic cassette. 

In cancer, it is clear that not all p53 mutations have equal effects; some simply confer 
loss of function, whereas others have a dominant-negative effect (such as 
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transdominant suppression of wt p53 or oncogenic gain of function), whereas still 
others show only a partial loss of function where, for example, only a fraction of p53 
target genes are dysregulated (32, 33). For these reasons, no single molecular 
assessment of p53 status appears to provide an absolute indication of the complete 
p53 function. Although the p53 classification method developed here seeks to 
categorize all tumors as either p53-deficient or not, it is likely that intermediate types 
exist with partial p53 functionality, distinguished by expression patterns that fall 
between those of the predominant mt-like and wt-like classes. Further investigation 
will be required to resolve the biological and clinical implications of such intermediate 
tumor classes. 
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Chemotherapeutic options to treat tuberculosis are severely restricted 
by the intrinsic resistance of Mycobacterium tuberculosis to the 
majority of clinically applied antibiotics. Such resistance is partially 
provided by the low permeability of their unique cell envelope. Here we 
describe a complementary system that coordinates resistance to drugs that have 
penetrated the envelope, allowing mycobacteria to tolerate diverse classes of 
antibiotics that inhibit cytoplasmic targets. This system depends on whiB7, a gene that 
pathogenic Mycobacterium shares with Streptomyces, a phylogenetically related genus 
known as the source of diverse antibiotics. In M. tuberculosis, whiB7\s induced by 
subinhibitory concentrations of antibiotics (erythromycin, tetracycline, and 
streptomycin) and whiB7m\\ mutants {Streptomyces and Mycobacterium) are 
hypersusceptible to antibiotics in vitro. M. tuberculosis \s also antibiotic sensitive within 
a monocyte model system. In addition to antibiotics, whiB7\s induced by exposure to 
fatty acids that pathogenic Mycobacterium species may accumulate internally or 
encounter within eukaryotic hosts during infection. Gene expression profiling analyses 
demonstrate that w/7/#7transcription determines drug resistance by activating 
expression of a regulon including genes involved in ribosomal protection and antibiotic 
efflux. Components of the whiB7 system may serve as attractive targets for the 
identification of inhibitors that render M. tuberculosis or multidrug-resistant derivatives 
more antibiotic-sensitive. 

multidrug resistance | Streptomyces | WhiB | microarray | gene expression 
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The World Health Organization has estimated that between 2000 and 2020, nearly one 
billion people will be newly infected, 200 million people will get sick, and 35 million will 
die from tuberculosis (TB) (1). It is the remarkable antibiotic tolerance of the infectious 
agent Mycobacterium tuberculosis to many commonly used broad-spectrum antibiotics 
that limits chemotherapeutic options and is the root cause of all treatment failure (2). 
Tolerance may reflect physiological adaptations that occur within the host, perhaps 
including an undefined developmental or physiological state that underlies persistent 
infection (3). As a result, patients must be treated with multiple antibiotics for 6-12 
months. Patient noncompliance or inadequate drug dosage favors the sequential 
acquisition of mutations providing resistance and the emergence of multidrug-resistant 
M. tuberculosis strains. In contrast to acquired resistance, intrinsic resistance in 
Mycobacterium has largely been attributed to its impermeable mycolic acid-containing 
cell envelope (4, 5) that is not found in many other Actinomycetes including 
Streptomyces. However, Jarlier and Nikaido (4) have also pointed out that this 
permeability barrier is insufficient to fully explain the high levels of drug resistance in 
Mycobacterium, suggesting that there must be synergistic systems effective against 
drugs that penetrate this barrier. Indeed, several mycobacterial genes not involved in 
outer envelope assembly confer resistance to specific, broad-spectrum antibiotics (6- 
8). 

Although the best-known Mycobacterium species are pathogenic, most are ubiquitous 
environmental saprophytes belonging to the Actinomycete taxon (9). The taxon also 
include Streptomyces species, filamentous bacteria known for their extraordinary 
capacity to produce thousands of diverse antibiotics as a part of a developmental 
program leading to sporulation. Antibiotic biosynthetic genes are found in clusters that 
typically include the corresponding resistance genes to provide self-protection (10). 
However, as in other bacteria, genes scattered throughout the genome that may have 
alternative physiological roles can also confer antibiotic resistance (11). Intuitively, the 
protective activity of these resistance genes should be a prerequisite for the evolution 
http://www.pnas.org/cgi/content/full/102/34/12200 10/4/2005 
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of antibiotic biosynthetic pathways. 

Indeed, at sublethal concentrations, antibiotics can induce a wide variety of genes, 
many not known to provide antibiotic resistance (12-14). The expression of some of 
these antibiotic-induced genes is under the control of stress inducible systems that 
respond to (13) or lead to (15) decreases in growth rate. The underlying control 
elements that affect antibiotic resistance include general stress-responsive sigma 
factors (16) and transcriptional activator proteins of the AraC family (MarA, SoxR, and 
Rob) (17), as well as genetic systems that provide for more specific adaptation to DNA 
damage (15) or oxidative stress (8). Such systems are commonly found in diverse 
bacterial groups and typically modulate antibiotic resistance within a rather narrow 
concentration range (8, 17). Here we describe a multidrug-resistance system that 
apparently evolved in the ancestors of antibiotic producing bacteria, which has been 
retained in saprophytes and pathogens belonging to the Actinomycete taxon. 

► Methods 

Media and Strains. Streptomyces lividans 1326 was grown in the 
nutrient-rich liquid media YEME and cultivated on NE solid media (18). 
The slow growing mycobacteria Mycobacterium bovis bacillus 
Calmette-Guerin, M. tuberculosis H37Rv, and the clinical M.< 
tuberculosis Isolate 1254 were propagated in 7H9 media (19), supplemented with 10% 
ADS (5% BSA/2% dextrose/0. 8% sodium chloride). 

Plasmid Constructions and Mutant Analyses. Annotated whiB7 ORFs were 
deleted in the genomes of S. lividans and Streptomyces coelicolor (nucleotide 
coordinates 5,647,587-5,648,293; http://1ic-bioinfo.bbsrc.ac.uk/streptomyces/ScoDB ), 
M. tuberculosis H37Rv (nucleotide coordinates 3,568,405-3,568,801; 
http://qenolist.pasteur.fr/TubercuList ), and M. bovis bacillus Calmette-Guerin 

http://www.pnas.org/cgi/content/full/102/34/12200 10/4/2005 
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(nucleotide coordinates 3,523,346-3,522,950; http://qenolist.pasteur.fr/BoviList ) as 
described Supporting Textand Data Set, which are published as supportin g 
information on the PNAS web site. Corresponding ORFs were expressed from vector 
promoters. 

Mycobacterial Survival in Monocytes. Resting or activated J774 monocytes were 
grown in DMEM/FCS. Monocytes were exposed to Mycobacterium bovis bacillus 
Calmette-Guerin and the corresponding whiB7 mutant at a multiplicity of one. 
Activation was achieved by 16-h exposure to 500 units/ml IFN-t followed by 4-h 
exposure to both IFN-t and 1 ng/ml LPS. The monocytes were washed twice with PBS 
and incubated for 45 min at 37°C/5% C0 2 with amikacin (200 ng/ml). Cells were again 

washed twice in PBS and incubated in DMEM/FCS. Survival was determined at the 
indicated incubation times by bacterial incorporation of tritiated uracil followed by 
liquid scintillation counting (20). For antibiotic susceptibility testing, the infected 
monocytes were incubated in the presence of indicated spectinomycin concentrations 
for 48 h before permeabilization and mycobacterial labeling (see Supporting Text for 
details). 

Microarray Expression Profiling and Analysis. Labeling of RNA and hybridizations 
to 70-mer oligonucleotide-based microarrays (Operon) was performed as described 
(21). Microarrays were scanned by using GenePix 4000A (Axon Instruments). 
Fluorescence intensities of the two channels at each spot were quantified by using the 
scanalyze software (http://rana.lbl.gov/EisenSoftware.htm) . After data for each array 
were normalized (21), expression ratios were averaged from two biological replicates 
for antibiotic-induced cultures or from three cultures for the mid-log comparison, and 
with two microarrays for each of the biological replicates. Data from each experimental 
condition was analyzed separately by using significance analysis of microarrays (22) with 
a false discovery ratio <0.3%. Significantly regulated genes for all experimental 
conditions were combined to generate a data set containing 2,879 genes. Within this 

http://www.pnas.org/cgi/content/full/102/34/12200 10/4/2005 
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list, gene expression data could be present for one experimental condition and absent 
from another. To aid hierarchical clustering, these genes were then filtered to include 
those that were present across 95% of the 25 experimental conditions (total number 
of rows in Fig. 3 b-d ) and a differential expression >2-fold under at least three of these 
conditions. The resulting 880 filtered genes were organized according to their 
expression profiles by average linkage clustering using genesis software 
( http:/ / genome.tuqraz.at/Software/GenesisCenter.html ) . 

Analysis of Mycobacterial RNA with Quantitative Real-Time RT-PCR. Real time 
PCR to confirm microarray analysis of in vitro grown cultures was performed by using 
SYBR green (Applied Biosystems). A standard curve was generated for the relative 
quantification of all genes, and a control reaction lacking reverse transcriptase was 
performed for every RNA sample. The major housekeeping sigma factor gene sigA was 
used to normalize mRNA levels. Gene induction values were calibrated by comparison 
with the reference RNA isolated for each experiment. 
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Fig. 1. Identification of whiB7, a gene providing 
intrinsic antibiotic resistance in Streptomyces. (a) 
Antibiotic susceptibility of the wild-type S. lividans 
strain (Left) and the spontaneous whiB7 mutant RM1 
(Right). Etest strips were applied to seeded spores, 
and minimal inhibition concentration values were read 
from the scale (|ig/ml) at the point of intersection 
between inhibition ellipse edge and the strip. Upper, 
erythromycin; right, tetracycline; lower, rifampicin; 
left, quinupristin/dalfopristin. (b) S. iividansRMl was 
engineered to allow thiostrepton-inducible expression 
of whiB7by using the expression plasmid pIJ8600. 
Seeded spores of S. //V/iGte/?swild-type/pIJ8600 (Left), 
the whiB7 mutant RMl/pIJ8600 (Center), or 
RMl/pD8600:: w/7/E7 (Right) were exposed to radial 
gradients by discs containing 100 |ng of oleandomycin 
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(left), chlorotetracycline (right), or thiostrepton (top). 
The high frequency of suppressor colonies in the 
RMl/pD8600::w/7/£7culture are presumed to be 
promoter-up mutants that allow unregulated whiB7 
expression. Tetracycline induced synthesis of a red 
pigment (likely to be the antibiotic 
undecylprodigiosin) in both wild-type S. lividans and 
the whiB7 mutant. 

► Results 

Identification of a Multidrug Resistance Regulator in 
Streptomyces. We isolated a spontaneous mutant of S. lividans, 
RM1, that was hypersensitive to a diverse array of chemically and 
functionally unrelated clinical antibiotics that it does not synthesize, 
including chloramphenicol, fusidic acid, imipenem, lincosamides (clindamycin and 
lincomycin), macrolides (erythromycin, oleandomycin, and spiramycin), rifampicin, 
streptogramins (pristinamycin and virginiamycin), and tetracycline (see Fig. 5, which is 
published as supporting information on the PNAS web site). Sensitivities of the mutant 
and wild-type parent were quantified by using Etest diffusion strips to compare their 
minimal inhibition concentrations to four structurally and functionally distinct classes of 
antibiotics. RM1 was 600-, 400-, 25-, and 40-fold more sensitive to erythromycin, 
tetracycline, rifampicin, and the pristinamycin derivatives quinupristin/dalfopristin, 
respectively ( Fig, la ). The mutant displayed large decreases in intrinsic antibiotic 
resistance, but its susceptibility to a variety of other toxic, nonantibiotic stresses, 
including detergents, antiseptics, and oxidative stress inducers, was unchanged (Fig. 6, 
which is published as supporting information on the PNAS web site). Standard cloning, 
sequencing, and site-directed mutagenesis experiments (described in Supporting Text) 
identified the gene responsible for this multidrug resistance as whiB7\x\ both S. 
//V/flk/7s(GenBank accession no. AF205848 fGenBankl ) and S. coelicolor genomes 
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(Fig. 2) (23). Sequence comparison of the wild-type locus to that of RM1 revealed a 
frame shift in the wh/B7gene resulting from the insertion of a cytosine at nucleotide 
position 239. This locus was not linked to any recognizable antibiotic biosynthetic 
cluster. whiB7 encodes a 122-aa protein related to whiB, a putative S. coelicolor 
transcriptional regulatory gene (24). In M. tuberculosis, a WhiB-like protein (WhiB3) 
may act as a transcriptional regulator by binding and modulating the activity of RpoV 
(SigA), the principle sigma factor in M. tuberculosis {IS). To further demonstrate the 
correlation of whiB7 transcriptional activity with antibiotic resistance, the S. lividans 
mutant RM1 was engineered to allow inducible expression (26) of a wild-type copy of 
whiB7. Fig, lb shows that the circular zones of inhibition caused by diffusion of 
tetracycline (an aromatic polyketide) or oleandomycin (a macrolide) were distorted and 
reduced by a radial gradient of the inducer (thiostrepton), indicating higher levels of 
resistance associated with increased whiB7 transcription. 

Fig. 2. The orthologous whiB7\oc\ of Streptomyces 
and Mycobacterium, (a) Alignment of the WhiB7 
proteins from both Mycobacterium tuberculosis 
(WhiB7-tub) and /ep/3e(WhiB7-lep) with the 
Streptomyces WhiB7 and the prototypic family 
member WhiB from Streptomyces coelicolor. Four 
absolutely conserved cysteine residues and a 
tryptophan-containing/glycine-rich motif are 
conserved throughout the WhiB family (circled). An 
A/T-Hook DNA binding consensus sequence is found 
only in WhiB7 paralogs. ~, N-terminal sequence not 
shown, (b) Gene organization of the whiB7 genomic 
region. Shaded block arrows represent conserved 
ORFs. 

Members of the whiB gene lineage, including whiB7 of S. lividans (alternatively named 
wblQ, are restricted to the Actinomycetes (27); blast searches did not identify 
orthologs in any other published bacterial genome sequences. The prototype of this 
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gene family, whiB, was identified as a developmental gene in Streptomyces species 
that is essential for the differentiation of mycelium into pigmented spores (whfe) (28). 
The family signature is defined by four absolutely conserved cysteines that form an 
oxygen sensitive iron sulfur cluster (ref. 29 and L.N., P. Jensen, R.P.M., M. Folcher, S. 
Durr, S. Grzesiek, and CJ.T., unpublished data), arid a tryptophan within a glycine-rich 
sequence ( Fig. 2a) . In addition, w/7/E7paralogs also encode a C-terminal "A/T Hook" 
domain that is known to bind AT-rich DNA sequences. Although the M. tuberculosis 
genome encodes seven wh/'B-\\ke genes {whiBl-7), both homology and synteny predict 
a minimal core of five orthologous whiB-Wke genes common to M. tuberculosis, 
Mycobacterium leprae, and Streptomyces species {whiBl-4&x\<\ -7). tblastn searches of 
the 201 completed bacterial genomes identified w/7/E7orthologs in all species of 
Streptomyces {S. coelicolorand Streptomyces averimidilus), Mycobacterium 
{tuberculosis H37Rv, tuberculosis CDC1551, bovis AF21 22/97, lepraeTH, avium subsp. 
paratuberculosis), and Nocardia {farcinica IFN 10152. The role of the streptomycete 
wh/B7ger\e in determining broad spectrum drug resistance, predicted that it might 
play a similar role in pathogenic M. tuberculosis. 

A M//j/B70rtholog Controls Multidrug Resistance in M. tuberculosis. Intrinsic 
resistance in M. tuberculosis could be partially due to a whiB7 ox\ho\oq that is able to 
provide resistance to antibiotics that have penetrated the cell envelope and entered 
the cytoplasm. To test this hypothesis, we constructed a gene replacement mutant in 
M. tuberculosis, strain H37Rv. The mutant grew normally, but was defective in its 
resistance ( Table 1 ) to a variety of antibiotics including macrolides, a lincosamide, and 
an aminoglycoside. The wh/'B7gene was cloned into the integrative vector pMV361 to 
provide expression from a strong constitutive promoter (hsp60). Integration of this 
plasmid (pRPM251) into the chromosome of the M. tuberculosis whiB7 mutant restored 
normal, or slightly elevated levels of antibiotic resistance ( Table 1) . Multidrug 
sensitivity also resulted from disruption of the wM?7gene of the fast growing 
saprophytic Mycobacterium smegmatis {R.P.M . and CJ.T., unpublished results). 
http://www.pnas.org/cgi/content/full/102/34/12200 10/4/2005 
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Induction of the M. tuberculosis wh/B7 Gene by Multiple Antibiotics and 
Fatty Acids. Antibiotic resistance genes typically confer resistance to one class of 
antibiotic and are specifically activated by the corresponding drugs. Experiments were 
carried out to determine whether the broad spectrum of resistance conferred by whiB7 
might be controlled by regulatory systems that are responsive to dissimilar drugs (30). 
Microarray transcript profiling of all annotated M. tuberculosis genes was used to 
monitor expression in response to three chemically distinct classes of common 
antibiotics: the frontline antimycobacterial drug streptomycin, as well as erythromycin 
and tetracycline ( Fig. 3b) . M. tuberculosis 1254 cultures were treated with five 
concentrations of each antibiotic, spanning three orders of magnitude (0.5-100 ng/ml) 
including the minimal inhibition concentration. Expression was assayed 15 min after 
exposure to maximize detection of genes whose regulation most directly reflected 
whiB7 activity, rather than downstream pleiotropic effects. whiB7 expression was 
significantly induced by subinhibitory concentrations of both erythromycin (1.0 ng/ml) 
and tetracycline (0.5 ng/ml) and also higher levels of streptomycin (25 ng/ml). After 
longer exposure (24 h), concentrations of streptomycin as low as 1 i^g/ml induced 
whiB7 ( Fig. 3c ). The levels of induction were dose dependent for all three antibiotics 
( Fig. 3b) . Activation of whiB7 transcription by tetracycline (1 ng/ml) was confirmed by 
quantitative RT-PCR showing that whiB7RHIK levels progressively increased »70-fold 
during 24 h of exposure ( Fig. 3e ). 



Fig. 3. Identification of antibiotic resistance genes as parts of 
the M. tuberculosis wM?7 regulon. Significantly altered gene 
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expression ratios from all experiments were averaged, log 2 

transformed, and clustered according to the displayed color 
code. Red and blue indicate higher and lower gene 
expression, respectively, in experimental samples, in 
comparison to the reference. Black represents no difference. 
{a) Genes of the w/7/£7regulon. Rv and ORF numbers 
designate annotated M. tuberculosis genes, and arrows 
indicate contiguous genes. (Z?) Treatment (15 min) of M. 
View larger tuberculosis 1254 with 0.5, 1.0, 5, 25 and 100 ng/ml of 
version (21K): erythromycin, streptomycin, and tetracycline, (c) Extended 
fin this window] antibiotic treatment for 2 and 24 h (1 ng/ml). (d) M. 
fin a new window] tuberculosis H37Rv was used as reference and compared to a 

whiB7 'null mutant (WhiB7 KO) and a H37Rv strain engineered 
to overexpress wh/'B7(\Nh\B7 OV). Cells were assayed at an 
optical density of 0.4 at 600 nm. (e) whiB7 induction by fatty 
acids. Quantitative RT-PCR-determined induction factors of 
whiB7 expression. Primary /axis: M. tuberculosis grown in 
MDG fed 50 uM palmitic acid (purple) or its unsaturated form, 
oleic acid (gray). Secondary /axis: induction to prolonged 
exposure of tetracycline (connected dots) («2 |iM, 1 jig/ml). 

Many antibiotics, including erythromycin and tetracycline, are based on polyketides, 
fatty acid-like molecules with carbon backbones synthesized by enzyme complexes 
similar to fatty acid synthase. Like antibiotics, many fatty acids are known to suppress 
growth of diverse bacteria (31), including Mycobacterium spp. (32). Palmitic acid, as 
well as an unsaturated derivative, oleic acid, were likewise tested for their abilities to 
induce w/7/£7transcription by quantitative RT-PCR. Although both fatty acids activated 
w/7/#7transcription, the palmitic acid response was more rapid and achieved higher 
levels of induction. Although induction kinetics were concentration dependent, at least 
for the antibiotics tested, higher concentrations of externally applied palmitic acid were 
needed and lower levels of maximal induction were achieved (3- to 4-fold compared to 
70-fold for tetracycline). 



In conclusion, whiB7 expression was progressively induced at the transcriptional level 



http://www.pnas.org/cgi/content/full/102/34/12200 



10/4/2005 



Ancestral antibiotic resistance in Mycobacterium tuberculosis -- Morris et al. 102 (34... Page 12 of 21 

by sublethal concentrations of antibiotics and fatty acids. Up-regulation of whiB7 
expression may be required for the induction of other genes that could plausibly 
provide antibiotic resistance. These observations suggested that n/M?7encoded a 
regulator whose transcriptional induction activated a regulon providing intrinsic 
antibiotic resistance. 

Identification of Genes in the whiB7 Regulon by Microarray Analyses. To 

determine whether the induction of whiB7\Nas correlated with the expression of genes* 
associated with antibiotic resistance, microarray expression profiles of mid-log phase 
cultures of the whiB7 deletion mutant and a strain overexpressing wM?7 were 
compared to parental strain M. tuberculosis H37Rv ( Fiq.- 3oQ. These global analyses 
(details not presented) showed that whiB7\Nds the only gene induced initially, after 
exposure to minimal concentrations of antibiotic (0.5 mg/ml tetracycline for 15 min, for 
example). Thus, whiB7 represented a primary regulatory gene whose expression was 
followed by transcription of other genes in its regulon. Average distance hierarchical 
clustering identified 12 significantly regulated genes (sam false discovery rate < 0.3%) 
whose expression profile appeared to be influenced by antibiotic exposure and the 
activity of whiB7 ( Fig. 3) . The wM?7-dependent set of eight transcripts includes three 
genes that may provide intrinsic antibiotic resistance: tap (Rvl258c), encoding an 
efflux pump that confers low-level resistance to aminoglycosides and tetracycline (33); 
an unstudied ORF encoding a putative macrolide transporter (Rvl473) with an ATP- 
binding cassette; and erm (Rvl988), homologous to ribosomal methy transferases and 
conferring MLS (macrolide, lincosamide, and streptogramin) resistance by modification 
of 23S rRNA (7, 34). Although the whiB7 regulon may include unrecognized antibiotic 
resistance determinants, other functions were also suggested. These include eis 
(Rv2416C), a putative acetyl-transferase providing enhanced survival within 
macrophages, Rv0263C, a putative carboxylase catalyzing urea degradation, and cut2, 
a putative cutinase/lipase that is reported to be exposed on the outside of the cell 
membrane and potentially able to release fatty acids from external lipids (35). These 
http://www.pnas.org/cgi/content/full/102/34/12200 10/4/2005 
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possible functions of other genes in the putative wh/'B7 regulon, not known to be 
antibiotic resistance determinants, require further investigation. Some may play roles 
in bacterial physiology, a recognized but not well understood determinant of antibiotic 
resistance (36). 

Quantitative RT-PCR was used to independently confirm the induction data for genes 
within the wh/B7 regulon, including Rvl258, Rvl473, and Rvl988. Furthermore, 
primers targeting an intergenic sequence upstream of whiB7 showed that whiB7\Nas 
transcriptionally coupled to the small upstream unannotated ORF, ORFD0316 (Fig. 7, 
which is published as supporting information on the PNAS web site). Strains 
engineered to constitutively express whiB7\x\ trans (whiB7 OV) were associated with 
elevated levels of ORFD03 16 transcription, and ORFD0316 was down-regulated in the 
w/7/#7mutant, suggesting that whiB7 positively autoregulates its own transcription. 

Multidrug Resistance in a Monocyte Model System. The physiology of 
Mycobacterium growing in laboratory cultures is much different from their natural state 
during host cell infection (36). To investigate whether whiB7 controls survival or 
antibiotic resistance in a eukaryotic cellular environment, we monitored the 
intracellular survival of M. bovis bacillus Calmette-Guerin and a constructed isogenic 
wM?7 mutant harbored within untreated or spectinomycin-treated J774, a monocyte- 
like cell line most commonly used for antibiotic sensitivity testing. In the absence of 
antibiotic, the wild type and whiB7 muXant had similar survival curves in resting or IFN- 
T-activated J774 during the first 72 h of infection (Fig.-4a). Compared to liquid 
cultures, both bacillus Calmette-Guerin wild type and the whiB7 mutant were more 
sensitive to spectinomycin in J774. However, more importantly, in resting J774, the 
whiB7tr)utant was > 10-fold more sensitive to spectinomycin (as reflected by the 
concentration of antibiotic needed to reduce transcription by 50%; Fig. 4ft ). 



► Discussion 
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The fact that the wh/B7gene and its multidrug resistance phenotype 
have been retained in most Actinomycetes, including those that have 
not been continuously exposed to antibiotics in the environment, 
provides important insights into the evolution and biological function of 
"antibiotic resistance" genes and their regulatory systems. There are 14 genes 
belonging to the whiB family in the sequenced genome of S. coeiico/or(23). Site- 
directed mutagenesis of this set of genes (B. Gust and K. Chater, personal 
communication) has shown that, like whiB7, many do not have obvious sporulation 
(white) defects under standard growth conditions and that the multidrug sensitive 
phenotype is a unique characteristic of w/7/#7mutants (L.N. and CJ.T., unpublished 
results). Studies of whiB paralogs in Mycobacterium species have shown that the M. 
smegmatis wh/B2gene (also called whmD) is essential (37) and that another, whiB3, 
plays a role in virulence in some model systems (25). Here we focus on the ability of 
whiB7\.o determine multiple antibiotic resistances in Actinomycetes and suggest that, 
in mycobacterial species, it acts synergistically with a rather impermeable cell envelope 
to provide high levels of intrinsic resistance. 

whiB7\s notably different from systems reported in other bacteria that allow 
adaptation to a variety of different nonspecific stress conditions and may incidentally 
provide multiantibiotic resistance. wh/B7tioes not confer resistance to antiseptics, but 
rather to antibiotics having specific targets (see Figs. 5 and 6). w/7/E7function is also 
unique in that it confers relatively high levels of resistance: in the S. lividans mutant, 
antibiotic sensitivity increased by orders of magnitude; this is distinct from the general 
stress adaptive system, which confers much lower levels of multidrug resistance {mar) 
in enteric bacteria (17, 38) by using any one of three transcriptional activators having 
highly redundant functions. 
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Fig. 4. The phenotype of the whiB7 mutation in J774, a 
monocyte-derived cell line. Mycobacterial RNA transcription, 
an indicator of survival, was assayed by incorporation of 
tritiated uracil. Shown are mean values (±SD) of six to eight 
determinations. Resting (REST) or activated (ACT) J774 
monocytes growing in microtiter plates were exposed to M. 
bovis bacillus Calmette-Guerin and the corresponding whiB7 
mutant, (a) RNA synthesis of internalized WT and whiB7 
mutant cells decreased at the same rate in resting or 
activated monocytes, presumably because of bactericidal 
activity of the monocytes. In contrast, growth did occur in the 
control experiment where the bacteria were cultured in the 
same medium without monocytes (data not shown), (b) For 
antibiotic susceptibility testing, infected monocytes were 
incubated with medium containing the indicated 
concentrations of spectinomycin; % transcriptional activity is 
the percentage of incorporation rates in spectinomycin- 
treated vs. untreated bacteria. The same results were 
obtained in infected monocytes treated with amikacin after 
infection to confirm that incorporation rates reflected 
internalized mycobacteria (data not shown). 

whiB7\s a putative transcriptional activator that is induced by antibiotics and controls 
the expression of at least two documented antibiotic resistance genes. The presence of 
these structural genes and corresponding regulatory systems in Mycobacterium 
suggests that this system provides selective advantage. The retention of the multiple 
antibiotic-responsive regulatory system controlling M. tuberculosis whiB7 provides 
circumstantial evidence that toxic metabolites of various structures may have played a 
key role in directing the early evolution of the regulon to provide antibiotic resistance. 
The wh/B7ger\e, as well as 5 of its 10 M. tuberculosis target genes ( Fig. 3a) , are 
present in M. leprae (Rvl473, Rvl257c, Rvl258c, Rv0263, and Rv2725), whose 
genome has undergone dramatic reduction during evolution within metazoan 
(presumably mammalian) hosts. The presence of the functionally conserved whiB7 
locus in all Streptomyces and Mycobacterium spp. (also including saprophytic M. 



http://www.pnas.org/cgi/content/full/102/34/12200 



10/4/2005 



Ancestral antibiotic resistance in Mycobacterium tuberculosis -- Morris et al. 102 (34... Page 16 of 21 

smegmatis) genomes now sequenced records its origin in their presumed soil dwelling 
ancestor. Although it is not clear why this capacity should be retained by M. 
tuberculosis and M. leprae, long after their progenitor left the antibiotic containing soil, 
some of these genes may have been adapted to protect the microbe against 
compounds of the mammalian immune system. The whiB7 system was active in a 
monocyte model system; mutant was more sensitive to spectinomycin in J774 ( Fig. 
4J?). 

This evolutionary retention of whiB7, along with the observation that antibiotics with 
different structures activate it, implies a common endogenous inducer made by 
actinomycetes in response to antibiotics. Indeed, sublethal concentrations of some 
antibiotics induce synthesis of other secondary metabolites as demonstrated in 
Streptomyces (Figs. IB and 5) that may also be autotoxic. Although Mycobacterium 
species are not recognized as antibiotic producers, they do have a remarkably large 
repertoire of polyketide biosynthesis gene clusters (39), some of which may encode 
biosynthetic pathways for autotoxic compounds. 

Fatty acids, serving as precursors for diverse lipids and for the assembly of biological 
membranes, are nevertheless toxic to a wide variety of bacteria. Unlike most other 
bacteria, and for reasons that are not well understood, Actinomycetes commonly 
synthesize and accumulate extremely large amounts (20-80% of their biomass) of 
triacyl glycerols (40). This includes the primary precursor of complex lipids, palmitic 
acid, along with several unsaturated fatty acid derivatives, oleic, linoleic (unpublished 
data) and arachidonic (unpublished data) acids. All induced whiB7, with palmitic acid 
being the most active ( Fig. 3e ). The fact that the wM?7regulon, including antibiotic 
resistance genes, can be activated by palmitic acid has important implications for 
mycobacterial chemotherapy. Palmitic acid has been found in mycobacterial cytosol, 
and is considered to be a major source of carbon used by M. tuberculosis in the 
mammalian macrophage (41). It is also the principle fatty acid found in animal tissues 
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and serum. Therefore, the whiB7 regulon may be induced when M. tuberculosis enters 
macrophages or other lipid rich cells, organs, or tissues, thereby allowing mycobacteria 
to more effectively resist some chemotherapeutic strategies, sheltered in specific areas 
of the body. 

The intrinsic resistance of M. tuberculosis to antibiotics during in vivo growth and 
persistence underlies the need for protracted therapy for tuberculosis (2). Knowledge 
of such inducible intrinsic mycobacterial systems could generate derivatives of 
antibiotics that might circumvent detection by whiB7 regulators or perhaps WhiB7 
inhibitors that augment conventional therapies by inactivating groups of genes that 
confer intrinsic resistance. Such developments could not only open up a powerful 
repertoire of currently redundant clinical antibiotics in the treatment of tuberculosis but 
also reduce the problematic duration of chemotherapy. 
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significance analysis of microarrays (22) with a false discovery ratio < 0.3%," The reference to 
"significance analysis of microarrays (22)" is to the SAM article. 

6. All of the independent Claims of the above-identified application include the 
limitation that an expected value of the parameter is derived and compared to an observed or 
calculated value of the parameter, where the expected value is indicative of the extent of 
variations in the parameter introduced by the process by which the data (called associated values 
in the claims) are acquired. Claims 1 , 44 and 58 also contain the limitation that the parameters 
of the plurality of genes be adjusted so that variables related to the parameters are substantially 
independent of variations of scatter values or average associated values of the genes over the 
sets, said scatter values defined by standard deviation of the associated values in the sets. 

7. I believe that the analyses using SAM in the above quotes from the Miller and 
Morris articles employ the two claim limitations of paragraph 6 above through the use of SAM 
software. 

8 . I further declare that all statements made herein of my own knowledge are true 
and that all statements made on information and belief are believed to be true, and further that 
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these statements were made with the knowledge that willful false statements and the like so 
made are punishable by fine or imprisonment, or both under §1001 of Title 18 of the United 
States Code, and that such willful false statements may jeopardize the validity of the application 
or any patent issuing thereon. 



Dated: 
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